🔍 Code Extractor

function analyze_logs

Maturity: 48

Parses and analyzes log files to extract synchronization statistics, error counts, and performance metrics for a specified time period.

File:
/tf/active/vicechatdev/SPFCsync/monitor.py
Lines:
36 - 110
Complexity:
moderate

Purpose

This function reads a log file and analyzes entries within a specified time window (default 24 hours) to generate comprehensive statistics about synchronization operations between SharePoint and FileCloud. It tracks sync cycles, file operations (uploads, updates, skips), errors, warnings, and calculates performance metrics like average cycle time. The function is designed for monitoring and troubleshooting file synchronization processes.

Source Code

def analyze_logs(log_file, hours=24):
    """Analyze logs for the specified time period."""
    if not os.path.exists(log_file):
        print(f"Log file {log_file} not found")
        return
    
    cutoff_time = datetime.now() - timedelta(hours=hours)
    
    stats = {
        'total_lines': 0,
        'sync_cycles': 0,
        'new_uploads': 0,
        'updated_files': 0,
        'skipped_files': 0,
        'errors': 0,
        'warnings': 0,
        'last_sync': None,
        'avg_cycle_time': 0,
        'cycle_times': []
    }
    
    cycle_start_time = None
    
    try:
        with open(log_file, 'r') as f:
            for line in f:
                stats['total_lines'] += 1
                parsed = parse_log_line(line)
                
                if not parsed or parsed['timestamp'] < cutoff_time:
                    continue
                
                message = parsed['message']
                level = parsed['level']
                
                # Count log levels
                if level == 'ERROR':
                    stats['errors'] += 1
                elif level == 'WARNING':
                    stats['warnings'] += 1
                
                # Track sync cycles
                if 'Starting SharePoint to FileCloud synchronization' in message:
                    cycle_start_time = parsed['timestamp']
                elif 'Synchronization completed' in message and cycle_start_time:
                    stats['sync_cycles'] += 1
                    stats['last_sync'] = parsed['timestamp']
                    cycle_time = (parsed['timestamp'] - cycle_start_time).total_seconds()
                    stats['cycle_times'].append(cycle_time)
                
                # Extract sync statistics
                if 'new_uploads' in message:
                    match = re.search(r"'new_uploads': (\d+)", message)
                    if match:
                        stats['new_uploads'] += int(match.group(1))
                
                if 'updated_files' in message:
                    match = re.search(r"'updated_files': (\d+)", message)
                    if match:
                        stats['updated_files'] += int(match.group(1))
                
                if 'skipped_files' in message:
                    match = re.search(r"'skipped_files': (\d+)", message)
                    if match:
                        stats['skipped_files'] += int(match.group(1))
    
    except Exception as e:
        print(f"Error reading log file: {e}")
        return
    
    # Calculate average cycle time
    if stats['cycle_times']:
        stats['avg_cycle_time'] = sum(stats['cycle_times']) / len(stats['cycle_times'])
    
    return stats

Parameters

Name Type Default Kind
log_file - - positional_or_keyword
hours - 24 positional_or_keyword

Parameter Details

log_file: String path to the log file to be analyzed. Must be a valid file path. If the file doesn't exist, the function prints an error message and returns None.

hours: Integer representing the number of hours to look back from the current time for log analysis. Default is 24 hours. Only log entries with timestamps within this window are included in the statistics. Must be a positive number.

Return Value

Returns a dictionary containing comprehensive log statistics, or None if the log file doesn't exist or an error occurs. The dictionary includes: 'total_lines' (int: total lines read), 'sync_cycles' (int: completed synchronization cycles), 'new_uploads' (int: total new files uploaded), 'updated_files' (int: total files updated), 'skipped_files' (int: total files skipped), 'errors' (int: count of ERROR level logs), 'warnings' (int: count of WARNING level logs), 'last_sync' (datetime object or None: timestamp of last completed sync), 'avg_cycle_time' (float: average time in seconds per sync cycle), 'cycle_times' (list of floats: individual cycle times in seconds).

Dependencies

  • os
  • datetime
  • re

Required Imports

import os
from datetime import datetime, timedelta
import re

Usage Example

import os
from datetime import datetime, timedelta
import re

# Assuming parse_log_line function is defined
def parse_log_line(line):
    # Implementation needed
    pass

# Analyze logs from the last 24 hours
stats = analyze_logs('/var/log/sync.log')
if stats:
    print(f"Sync cycles completed: {stats['sync_cycles']}")
    print(f"New uploads: {stats['new_uploads']}")
    print(f"Errors: {stats['errors']}")
    print(f"Average cycle time: {stats['avg_cycle_time']:.2f} seconds")
    if stats['last_sync']:
        print(f"Last sync: {stats['last_sync']}")

# Analyze logs from the last 48 hours
stats_48h = analyze_logs('/var/log/sync.log', hours=48)
if stats_48h:
    print(f"Total files processed: {stats_48h['new_uploads'] + stats_48h['updated_files']}")

Best Practices

  • Ensure the parse_log_line() helper function is properly implemented before using this function
  • The log file should be accessible and readable by the process running this function
  • For large log files, consider the memory implications as the function reads the entire file line by line
  • The function expects specific message patterns in the logs; ensure your logging format matches the expected patterns
  • Handle the None return value appropriately when the log file doesn't exist or errors occur
  • The hours parameter should be reasonable for your log file size to avoid processing excessive data
  • Log messages containing statistics must use Python dictionary string format with single quotes for proper regex matching
  • Consider implementing log rotation to prevent log files from growing too large
  • The function silently skips log entries outside the time window, which is efficient but may mask issues with log timestamps

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function main_v16 70.7% similar

    Executes a diagnostic analysis for file synchronization issues, analyzes missing files, and saves the results to a JSON file.

    From: /tf/active/vicechatdev/SPFCsync/deep_diagnostics.py
  • class SyncDiagnostics 68.6% similar

    A diagnostic class that analyzes and reports on synchronization issues between SharePoint and FileCloud, identifying missing files and root causes of sync failures.

    From: /tf/active/vicechatdev/SPFCsync/deep_diagnostics.py
  • function main_v32 68.3% similar

    Command-line interface entry point for monitoring SharePoint to FileCloud synchronization logs, providing status analysis, log tailing, and real-time watching capabilities.

    From: /tf/active/vicechatdev/SPFCsync/monitor.py
  • function print_status 66.4% similar

    Prints a formatted status report for SharePoint to FileCloud synchronization operations, displaying sync statistics, timing information, and health indicators.

    From: /tf/active/vicechatdev/SPFCsync/monitor.py
  • function dry_run_test 63.0% similar

    Performs a dry run test of SharePoint to FileCloud synchronization, analyzing up to a specified number of documents without actually transferring files.

    From: /tf/active/vicechatdev/SPFCsync/dry_run_test.py
← Back to Browse