function analyze_logs
Parses and analyzes log files to extract synchronization statistics, error counts, and performance metrics for a specified time period.
/tf/active/vicechatdev/SPFCsync/monitor.py
36 - 110
moderate
Purpose
This function reads a log file and analyzes entries within a specified time window (default 24 hours) to generate comprehensive statistics about synchronization operations between SharePoint and FileCloud. It tracks sync cycles, file operations (uploads, updates, skips), errors, warnings, and calculates performance metrics like average cycle time. The function is designed for monitoring and troubleshooting file synchronization processes.
Source Code
def analyze_logs(log_file, hours=24):
"""Analyze logs for the specified time period."""
if not os.path.exists(log_file):
print(f"Log file {log_file} not found")
return
cutoff_time = datetime.now() - timedelta(hours=hours)
stats = {
'total_lines': 0,
'sync_cycles': 0,
'new_uploads': 0,
'updated_files': 0,
'skipped_files': 0,
'errors': 0,
'warnings': 0,
'last_sync': None,
'avg_cycle_time': 0,
'cycle_times': []
}
cycle_start_time = None
try:
with open(log_file, 'r') as f:
for line in f:
stats['total_lines'] += 1
parsed = parse_log_line(line)
if not parsed or parsed['timestamp'] < cutoff_time:
continue
message = parsed['message']
level = parsed['level']
# Count log levels
if level == 'ERROR':
stats['errors'] += 1
elif level == 'WARNING':
stats['warnings'] += 1
# Track sync cycles
if 'Starting SharePoint to FileCloud synchronization' in message:
cycle_start_time = parsed['timestamp']
elif 'Synchronization completed' in message and cycle_start_time:
stats['sync_cycles'] += 1
stats['last_sync'] = parsed['timestamp']
cycle_time = (parsed['timestamp'] - cycle_start_time).total_seconds()
stats['cycle_times'].append(cycle_time)
# Extract sync statistics
if 'new_uploads' in message:
match = re.search(r"'new_uploads': (\d+)", message)
if match:
stats['new_uploads'] += int(match.group(1))
if 'updated_files' in message:
match = re.search(r"'updated_files': (\d+)", message)
if match:
stats['updated_files'] += int(match.group(1))
if 'skipped_files' in message:
match = re.search(r"'skipped_files': (\d+)", message)
if match:
stats['skipped_files'] += int(match.group(1))
except Exception as e:
print(f"Error reading log file: {e}")
return
# Calculate average cycle time
if stats['cycle_times']:
stats['avg_cycle_time'] = sum(stats['cycle_times']) / len(stats['cycle_times'])
return stats
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
log_file |
- | - | positional_or_keyword |
hours |
- | 24 | positional_or_keyword |
Parameter Details
log_file: String path to the log file to be analyzed. Must be a valid file path. If the file doesn't exist, the function prints an error message and returns None.
hours: Integer representing the number of hours to look back from the current time for log analysis. Default is 24 hours. Only log entries with timestamps within this window are included in the statistics. Must be a positive number.
Return Value
Returns a dictionary containing comprehensive log statistics, or None if the log file doesn't exist or an error occurs. The dictionary includes: 'total_lines' (int: total lines read), 'sync_cycles' (int: completed synchronization cycles), 'new_uploads' (int: total new files uploaded), 'updated_files' (int: total files updated), 'skipped_files' (int: total files skipped), 'errors' (int: count of ERROR level logs), 'warnings' (int: count of WARNING level logs), 'last_sync' (datetime object or None: timestamp of last completed sync), 'avg_cycle_time' (float: average time in seconds per sync cycle), 'cycle_times' (list of floats: individual cycle times in seconds).
Dependencies
osdatetimere
Required Imports
import os
from datetime import datetime, timedelta
import re
Usage Example
import os
from datetime import datetime, timedelta
import re
# Assuming parse_log_line function is defined
def parse_log_line(line):
# Implementation needed
pass
# Analyze logs from the last 24 hours
stats = analyze_logs('/var/log/sync.log')
if stats:
print(f"Sync cycles completed: {stats['sync_cycles']}")
print(f"New uploads: {stats['new_uploads']}")
print(f"Errors: {stats['errors']}")
print(f"Average cycle time: {stats['avg_cycle_time']:.2f} seconds")
if stats['last_sync']:
print(f"Last sync: {stats['last_sync']}")
# Analyze logs from the last 48 hours
stats_48h = analyze_logs('/var/log/sync.log', hours=48)
if stats_48h:
print(f"Total files processed: {stats_48h['new_uploads'] + stats_48h['updated_files']}")
Best Practices
- Ensure the parse_log_line() helper function is properly implemented before using this function
- The log file should be accessible and readable by the process running this function
- For large log files, consider the memory implications as the function reads the entire file line by line
- The function expects specific message patterns in the logs; ensure your logging format matches the expected patterns
- Handle the None return value appropriately when the log file doesn't exist or errors occur
- The hours parameter should be reasonable for your log file size to avoid processing excessive data
- Log messages containing statistics must use Python dictionary string format with single quotes for proper regex matching
- Consider implementing log rotation to prevent log files from growing too large
- The function silently skips log entries outside the time window, which is efficient but may mask issues with log timestamps
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function main_v16 70.7% similar
-
class SyncDiagnostics 68.6% similar
-
function main_v32 68.3% similar
-
function print_status 66.4% similar
-
function dry_run_test 63.0% similar