function index_all_documents
Flask route handler that initiates background indexing of all documents in the system, creating a task ID for tracking progress and returning immediately while indexing continues asynchronously.
/tf/active/vicechatdev/docchat/blueprint.py
221 - 260
moderate
Purpose
This endpoint provides an asynchronous document indexing mechanism for a document chat application. It creates a background thread to index all documents using DocumentIndexer, stores task metadata in a global active_tasks dictionary for progress tracking, and returns a task_id to the client for polling status. The function is designed for web applications where document indexing is time-consuming and should not block the HTTP response. It includes user authentication via login_required decorator and associates tasks with the current user.
Source Code
def index_all_documents():
"""Start background indexing of all documents"""
try:
task_id = str(uuid_module.uuid4())
# Store task info
active_tasks[task_id] = {
'status': 'processing',
'progress': 'Starting indexing...',
'user': get_current_username()
}
# Start indexing in background (simplified - in production use Celery)
import threading
def run_indexing():
try:
indexer = DocumentIndexer()
results = indexer.index_documents()
active_tasks[task_id] = {
'status': 'completed',
'progress': 'Indexing completed',
'results': results
}
except Exception as e:
logger.error(f"Indexing error: {e}")
active_tasks[task_id] = {
'status': 'failed',
'progress': f'Error: {str(e)}'
}
thread = threading.Thread(target=run_indexing, daemon=True)
thread.start()
return jsonify({'task_id': task_id})
except Exception as e:
logger.error(f"Error starting indexing: {e}")
return jsonify({'error': str(e)}), 500
Return Value
Returns a Flask JSON response. On success: {'task_id': '<uuid-string>'} with HTTP 200 status. On error: {'error': '<error-message>'} with HTTP 500 status. The task_id can be used to poll the active_tasks dictionary for indexing progress and results.
Dependencies
flaskflask_loginuuidloggingthreadingwerkzeug
Required Imports
from flask import Blueprint, jsonify
from flask_login import login_required
import uuid as uuid_module
import threading
import logging
Conditional/Optional Imports
These imports are only needed under specific conditions:
from document_indexer import DocumentIndexer
Condition: Required custom module for document indexing functionality - must be available in the application
Required (conditional)logger = logging.getLogger(__name__)
Condition: Logger instance must be configured in the module scope for error logging
Required (conditional)Usage Example
# Assuming Flask app setup with authentication
# POST request to /api/index-all endpoint
import requests
# Client-side usage
response = requests.post(
'http://localhost:5000/api/index-all',
headers={'Authorization': 'Bearer <token>'},
cookies={'session': '<session-cookie>'}
)
if response.status_code == 200:
task_id = response.json()['task_id']
print(f"Indexing started with task_id: {task_id}")
# Poll for status
status_response = requests.get(
f'http://localhost:5000/api/task-status/{task_id}'
)
task_info = status_response.json()
print(f"Status: {task_info['status']}")
print(f"Progress: {task_info['progress']}")
else:
print(f"Error: {response.json()['error']}")
Best Practices
- This implementation uses threading.Thread with daemon=True, which is suitable for development but NOT recommended for production - use Celery, RQ, or similar task queue systems instead
- The active_tasks dictionary is stored in memory and will be lost on server restart - consider using Redis or a database for persistent task tracking
- No cleanup mechanism exists for completed tasks in active_tasks - implement a cleanup strategy to prevent memory leaks
- Thread safety is not guaranteed for active_tasks dictionary access - consider using threading.Lock or thread-safe data structures
- Error handling captures exceptions but doesn't provide detailed error context - consider adding more specific exception handling
- The function assumes DocumentIndexer().index_documents() is thread-safe - verify this before production use
- No timeout mechanism exists for long-running indexing tasks - consider implementing task timeouts
- The task_id is generated using uuid4 which is secure, but ensure active_tasks doesn't grow unbounded
- Consider adding rate limiting to prevent multiple simultaneous indexing operations from the same user
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function api_index_folder 86.4% similar
-
function api_index_progress 73.4% similar
-
function get_task_status 68.4% similar
-
function api_documents 67.2% similar
-
function api_task_status 67.0% similar