function cleanup_old_documents
Periodically removes documents and their associated files that are older than 2 hours from the uploaded_documents dictionary, cleaning up both file system storage and memory.
/tf/active/vicechatdev/vice_ai/app.py
135 - 166
moderate
Purpose
This function serves as a maintenance routine to prevent unbounded growth of stored documents by automatically deleting documents that have exceeded a 2-hour retention period. It iterates through all user documents, checks their age, removes expired files from the filesystem, deletes their entries from the uploaded_documents dictionary, and cleans up empty user entries. This is typically called on a scheduled basis (e.g., via threading.Timer or a background task) to maintain system resources.
Source Code
def cleanup_old_documents():
"""Clean up documents older than 2 hours"""
with document_lock:
current_time = datetime.now()
users_to_clean = []
for user_email, docs in uploaded_documents.items():
docs_to_remove = []
for doc_id, doc_info in docs.items():
age = current_time - doc_info['created_at']
if age.total_seconds() > 7200: # 2 hours
docs_to_remove.append(doc_id)
# Remove old documents
for doc_id in docs_to_remove:
try:
if os.path.exists(doc_info['file_path']):
os.remove(doc_info['file_path'])
except Exception as e:
logger.warning(f"Error removing old file: {e}")
del docs[doc_id]
logger.info(f"Cleaned up old document: {doc_id}")
# Mark empty users for cleanup
if not docs:
users_to_clean.append(user_email)
# Clean up empty user entries
for user_email in users_to_clean:
del uploaded_documents[user_email]
Return Value
This function returns None. It performs side effects by modifying the global uploaded_documents dictionary and deleting files from the filesystem.
Dependencies
osdatetimeloggingthreading
Required Imports
import os
from datetime import datetime
import logging
from threading import Lock
Usage Example
import os
import logging
from datetime import datetime
from threading import Lock, Timer
# Setup required globals
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
document_lock = Lock()
uploaded_documents = {
'user@example.com': {
'doc123': {
'created_at': datetime.now(),
'file_path': '/tmp/doc123.pdf'
}
}
}
# Define the function
def cleanup_old_documents():
with document_lock:
current_time = datetime.now()
users_to_clean = []
for user_email, docs in uploaded_documents.items():
docs_to_remove = []
for doc_id, doc_info in docs.items():
age = current_time - doc_info['created_at']
if age.total_seconds() > 7200:
docs_to_remove.append(doc_id)
for doc_id in docs_to_remove:
try:
if os.path.exists(doc_info['file_path']):
os.remove(doc_info['file_path'])
except Exception as e:
logger.warning(f"Error removing old file: {e}")
del docs[doc_id]
logger.info(f"Cleaned up old document: {doc_id}")
if not docs:
users_to_clean.append(user_email)
for user_email in users_to_clean:
del uploaded_documents[user_email]
# Run cleanup immediately
cleanup_old_documents()
# Or schedule periodic cleanup (every 30 minutes)
def schedule_cleanup():
cleanup_old_documents()
Timer(1800, schedule_cleanup).start()
schedule_cleanup()
Best Practices
- This function must be called with proper global variables initialized: uploaded_documents (dict), document_lock (Lock), and logger (Logger)
- The function uses a Lock to ensure thread-safe access to the shared uploaded_documents dictionary - do not remove this locking mechanism in multi-threaded environments
- Schedule this function to run periodically (e.g., every 30-60 minutes) using threading.Timer or a task scheduler to prevent memory and disk space issues
- Ensure the application has proper file system permissions to delete files in the directories where documents are stored
- The function silently catches and logs file deletion errors to prevent one failed deletion from stopping the entire cleanup process
- Consider adjusting the 7200 seconds (2 hours) threshold based on your application's requirements
- Be aware of a potential bug: the variable 'doc_info' in the file deletion loop references the last doc_info from the previous loop, which may cause incorrect file path references. Consider fixing this by accessing docs[doc_id] directly
- Monitor logger output to track cleanup operations and identify any recurring file deletion errors
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function cleanup_old_tasks 68.8% similar
-
function clear_user_uploaded_documents 68.4% similar
-
function cleanup_old_tasks_v1 66.0% similar
-
function file_cleanup 65.6% similar
-
function remove_uploaded_document 65.5% similar