analyze_logs - Code Extractor

function analyze_logs

Maturity: 48

Parses and analyzes log files to extract synchronization statistics, error counts, and performance metrics for a specified time period.

File:
/tf/active/vicechatdev/SPFCsync/monitor.py

Lines:
36 - 110

Complexity:
moderate

Purpose

This function reads a log file and analyzes entries within a specified time window (default 24 hours) to generate comprehensive statistics about synchronization operations between SharePoint and FileCloud. It tracks sync cycles, file operations (uploads, updates, skips), errors, warnings, and calculates performance metrics like average cycle time. The function is designed for monitoring and troubleshooting file synchronization processes.

Source Code

def analyze_logs(log_file, hours=24):
    """Analyze logs for the specified time period."""
    if not os.path.exists(log_file):
        print(f"Log file {log_file} not found")
        return
    
    cutoff_time = datetime.now() - timedelta(hours=hours)
    
    stats = {
        'total_lines': 0,
        'sync_cycles': 0,
        'new_uploads': 0,
        'updated_files': 0,
        'skipped_files': 0,
        'errors': 0,
        'warnings': 0,
        'last_sync': None,
        'avg_cycle_time': 0,
        'cycle_times': []
    }
    
    cycle_start_time = None
    
    try:
        with open(log_file, 'r') as f:
            for line in f:
                stats['total_lines'] += 1
                parsed = parse_log_line(line)
                
                if not parsed or parsed['timestamp'] < cutoff_time:
                    continue
                
                message = parsed['message']
                level = parsed['level']
                
                # Count log levels
                if level == 'ERROR':
                    stats['errors'] += 1
                elif level == 'WARNING':
                    stats['warnings'] += 1
                
                # Track sync cycles
                if 'Starting SharePoint to FileCloud synchronization' in message:
                    cycle_start_time = parsed['timestamp']
                elif 'Synchronization completed' in message and cycle_start_time:
                    stats['sync_cycles'] += 1
                    stats['last_sync'] = parsed['timestamp']
                    cycle_time = (parsed['timestamp'] - cycle_start_time).total_seconds()
                    stats['cycle_times'].append(cycle_time)
                
                # Extract sync statistics
                if 'new_uploads' in message:
                    match = re.search(r"'new_uploads': (\d+)", message)
                    if match:
                        stats['new_uploads'] += int(match.group(1))
                
                if 'updated_files' in message:
                    match = re.search(r"'updated_files': (\d+)", message)
                    if match:
                        stats['updated_files'] += int(match.group(1))
                
                if 'skipped_files' in message:
                    match = re.search(r"'skipped_files': (\d+)", message)
                    if match:
                        stats['skipped_files'] += int(match.group(1))
    
    except Exception as e:
        print(f"Error reading log file: {e}")
        return
    
    # Calculate average cycle time
    if stats['cycle_times']:
        stats['avg_cycle_time'] = sum(stats['cycle_times']) / len(stats['cycle_times'])
    
    return stats

Parameters

Name	Type	Default	Kind
`log_file`	-	-	positional_or_keyword
`hours`	-	24	positional_or_keyword

Parameter Details

log_file: String path to the log file to be analyzed. Must be a valid file path. If the file doesn't exist, the function prints an error message and returns None.

hours: Integer representing the number of hours to look back from the current time for log analysis. Default is 24 hours. Only log entries with timestamps within this window are included in the statistics. Must be a positive number.

Return Value

Returns a dictionary containing comprehensive log statistics, or None if the log file doesn't exist or an error occurs. The dictionary includes: 'total_lines' (int: total lines read), 'sync_cycles' (int: completed synchronization cycles), 'new_uploads' (int: total new files uploaded), 'updated_files' (int: total files updated), 'skipped_files' (int: total files skipped), 'errors' (int: count of ERROR level logs), 'warnings' (int: count of WARNING level logs), 'last_sync' (datetime object or None: timestamp of last completed sync), 'avg_cycle_time' (float: average time in seconds per sync cycle), 'cycle_times' (list of floats: individual cycle times in seconds).

Dependencies

os
datetime
re

Required Imports

import os
from datetime import datetime, timedelta
import re

Usage Example

import os
from datetime import datetime, timedelta
import re

# Assuming parse_log_line function is defined
def parse_log_line(line):
    # Implementation needed
    pass

# Analyze logs from the last 24 hours
stats = analyze_logs('/var/log/sync.log')
if stats:
    print(f"Sync cycles completed: {stats['sync_cycles']}")
    print(f"New uploads: {stats['new_uploads']}")
    print(f"Errors: {stats['errors']}")
    print(f"Average cycle time: {stats['avg_cycle_time']:.2f} seconds")
    if stats['last_sync']:
        print(f"Last sync: {stats['last_sync']}")

# Analyze logs from the last 48 hours
stats_48h = analyze_logs('/var/log/sync.log', hours=48)
if stats_48h:
    print(f"Total files processed: {stats_48h['new_uploads'] + stats_48h['updated_files']}")

Best Practices

Ensure the parse_log_line() helper function is properly implemented before using this function
The log file should be accessible and readable by the process running this function
For large log files, consider the memory implications as the function reads the entire file line by line
The function expects specific message patterns in the logs; ensure your logging format matches the expected patterns
Handle the None return value appropriately when the log file doesn't exist or errors occur
The hours parameter should be reasonable for your log file size to avoid processing excessive data
Log messages containing statistics must use Python dictionary string format with single quotes for proper regex matching
Consider implementing log rotation to prevent log files from growing too large
The function silently skips log entries outside the time window, which is efficient but may mask issues with log timestamps

Similar Components

AI-powered semantic similarity - components with related functionality:

function main_v16 70.7% similar

Executes a diagnostic analysis for file synchronization issues, analyzes missing files, and saves the results to a JSON file.
From: /tf/active/vicechatdev/SPFCsync/deep_diagnostics.py
class SyncDiagnostics 68.6% similar

A diagnostic class that analyzes and reports on synchronization issues between SharePoint and FileCloud, identifying missing files and root causes of sync failures.
From: /tf/active/vicechatdev/SPFCsync/deep_diagnostics.py
function main_v32 68.3% similar

Command-line interface entry point for monitoring SharePoint to FileCloud synchronization logs, providing status analysis, log tailing, and real-time watching capabilities.
From: /tf/active/vicechatdev/SPFCsync/monitor.py
function print_status 66.4% similar

Prints a formatted status report for SharePoint to FileCloud synchronization operations, displaying sync statistics, timing information, and health indicators.
From: /tf/active/vicechatdev/SPFCsync/monitor.py
function dry_run_test 63.0% similar

Performs a dry run test of SharePoint to FileCloud synchronization, analyzing up to a specified number of documents without actually transferring files.
From: /tf/active/vicechatdev/SPFCsync/dry_run_test.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def analyze_logs(log_file, hours=24):
    """Analyze logs for the specified time period."""
    if not os.path.exists(log_file):
        print(f"Log file {log_file} not found")
        return
    
    cutoff_time = datetime.now() - timedelta(hours=hours)
    
    stats = {
        'total_lines': 0,
        'sync_cycles': 0,
        'new_uploads': 0,
        'updated_files': 0,
        'skipped_files': 0,
        'errors': 0,
        'warnings': 0,
        'last_sync': None,
        'avg_cycle_time': 0,
        'cycle_times': []
    }
    
    cycle_start_time = None
    
    try:
        with open(log_file, 'r') as f:
            for line in f:
                stats['total_lines'] += 1
                parsed = parse_log_line(line)
                
                if not parsed or parsed['timestamp'] < cutoff_time:
                    continue
                
                message = parsed['message']
                level = parsed['level']
                
                # Count log levels
                if level == 'ERROR':
                    stats['errors'] += 1
                elif level == 'WARNING':
                    stats['warnings'] += 1
                
                # Track sync cycles
                if 'Starting SharePoint to FileCloud synchronization' in message:
                    cycle_start_time = parsed['timestamp']
                elif 'Synchronization completed' in message and cycle_start_time:
                    stats['sync_cycles'] += 1
                    stats['last_sync'] = parsed['timestamp']
                    cycle_time = (parsed['timestamp'] - cycle_start_time).total_seconds()
                    stats['cycle_times'].append(cycle_time)
                
                # Extract sync statistics
                if 'new_uploads' in message:
                    match = re.search(r"'new_uploads': (\d+)", message)
                    if match:
                        stats['new_uploads'] += int(match.group(1))
                
                if 'updated_files' in message:
                    match = re.search(r"'updated_files': (\d+)", message)
                    if match:
                        stats['updated_files'] += int(match.group(1))
                
                if 'skipped_files' in message:
                    match = re.search(r"'skipped_files': (\d+)", message)
                    if match:
                        stats['skipped_files'] += int(match.group(1))
    
    except Exception as e:
        print(f"Error reading log file: {e}")
        return
    
    # Calculate average cycle time
    if stats['cycle_times']:
        stats['avg_cycle_time'] = sum(stats['cycle_times']) / len(stats['cycle_times'])
    
    return stats
                        

Improved Code

🔍 Code Extractor

function analyze_logs

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function main_v16 70.7% similar

class SyncDiagnostics 68.6% similar

function main_v32 68.3% similar

function print_status 66.4% similar

function dry_run_test 63.0% similar

function analyze_logs

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function main_v16 70.7% similar

class SyncDiagnostics 68.6% similar

function main_v32 68.3% similar

function print_status 66.4% similar

function dry_run_test 63.0% similar

✨ Improve Code: analyze_logs

Code Comparison