🔍 Code Extractor

function cleanup_old_documents

Maturity: 46

Periodically removes documents and their associated files that are older than 2 hours from the uploaded_documents dictionary, cleaning up both file system storage and memory.

File:
/tf/active/vicechatdev/vice_ai/app.py
Lines:
135 - 166
Complexity:
moderate

Purpose

This function serves as a maintenance routine to prevent unbounded growth of stored documents by automatically deleting documents that have exceeded a 2-hour retention period. It iterates through all user documents, checks their age, removes expired files from the filesystem, deletes their entries from the uploaded_documents dictionary, and cleans up empty user entries. This is typically called on a scheduled basis (e.g., via threading.Timer or a background task) to maintain system resources.

Source Code

def cleanup_old_documents():
    """Clean up documents older than 2 hours"""
    with document_lock:
        current_time = datetime.now()
        users_to_clean = []
        
        for user_email, docs in uploaded_documents.items():
            docs_to_remove = []
            
            for doc_id, doc_info in docs.items():
                age = current_time - doc_info['created_at']
                if age.total_seconds() > 7200:  # 2 hours
                    docs_to_remove.append(doc_id)
            
            # Remove old documents
            for doc_id in docs_to_remove:
                try:
                    if os.path.exists(doc_info['file_path']):
                        os.remove(doc_info['file_path'])
                except Exception as e:
                    logger.warning(f"Error removing old file: {e}")
                
                del docs[doc_id]
                logger.info(f"Cleaned up old document: {doc_id}")
            
            # Mark empty users for cleanup
            if not docs:
                users_to_clean.append(user_email)
        
        # Clean up empty user entries
        for user_email in users_to_clean:
            del uploaded_documents[user_email]

Return Value

This function returns None. It performs side effects by modifying the global uploaded_documents dictionary and deleting files from the filesystem.

Dependencies

  • os
  • datetime
  • logging
  • threading

Required Imports

import os
from datetime import datetime
import logging
from threading import Lock

Usage Example

import os
import logging
from datetime import datetime
from threading import Lock, Timer

# Setup required globals
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
document_lock = Lock()
uploaded_documents = {
    'user@example.com': {
        'doc123': {
            'created_at': datetime.now(),
            'file_path': '/tmp/doc123.pdf'
        }
    }
}

# Define the function
def cleanup_old_documents():
    with document_lock:
        current_time = datetime.now()
        users_to_clean = []
        
        for user_email, docs in uploaded_documents.items():
            docs_to_remove = []
            
            for doc_id, doc_info in docs.items():
                age = current_time - doc_info['created_at']
                if age.total_seconds() > 7200:
                    docs_to_remove.append(doc_id)
            
            for doc_id in docs_to_remove:
                try:
                    if os.path.exists(doc_info['file_path']):
                        os.remove(doc_info['file_path'])
                except Exception as e:
                    logger.warning(f"Error removing old file: {e}")
                
                del docs[doc_id]
                logger.info(f"Cleaned up old document: {doc_id}")
            
            if not docs:
                users_to_clean.append(user_email)
        
        for user_email in users_to_clean:
            del uploaded_documents[user_email]

# Run cleanup immediately
cleanup_old_documents()

# Or schedule periodic cleanup (every 30 minutes)
def schedule_cleanup():
    cleanup_old_documents()
    Timer(1800, schedule_cleanup).start()

schedule_cleanup()

Best Practices

  • This function must be called with proper global variables initialized: uploaded_documents (dict), document_lock (Lock), and logger (Logger)
  • The function uses a Lock to ensure thread-safe access to the shared uploaded_documents dictionary - do not remove this locking mechanism in multi-threaded environments
  • Schedule this function to run periodically (e.g., every 30-60 minutes) using threading.Timer or a task scheduler to prevent memory and disk space issues
  • Ensure the application has proper file system permissions to delete files in the directories where documents are stored
  • The function silently catches and logs file deletion errors to prevent one failed deletion from stopping the entire cleanup process
  • Consider adjusting the 7200 seconds (2 hours) threshold based on your application's requirements
  • Be aware of a potential bug: the variable 'doc_info' in the file deletion loop references the last doc_info from the previous loop, which may cause incorrect file path references. Consider fixing this by accessing docs[doc_id] directly
  • Monitor logger output to track cleanup operations and identify any recurring file deletion errors

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function cleanup_old_tasks 68.8% similar

    Removes tasks from the active_tasks dictionary that are older than 1 hour (3600 seconds) based on their creation timestamp.

    From: /tf/active/vicechatdev/docchat/app.py
  • function clear_user_uploaded_documents 68.4% similar

    Removes all uploaded documents associated with a specific user from the application state in a thread-safe manner.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function cleanup_old_tasks_v1 66.0% similar

    Removes tasks from the active_tasks dictionary that are older than 1 hour (3600 seconds) based on their creation timestamp, using thread-safe locking.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function file_cleanup 65.6% similar

    Removes files older than 60 seconds from the './static/files/' directory.

    From: /tf/active/vicechatdev/datacapture_integrated.py
  • function remove_uploaded_document 65.5% similar

    Removes a specific uploaded document from the application state for a given user, with thread-safe locking and automatic cleanup of empty user entries.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse