🔍 Code Extractor

function index_all_documents

Maturity: 48

Flask route handler that initiates background indexing of all documents in the system, creating a task ID for tracking progress and returning immediately while indexing continues asynchronously.

File:
/tf/active/vicechatdev/docchat/blueprint.py
Lines:
221 - 260
Complexity:
moderate

Purpose

This endpoint provides an asynchronous document indexing mechanism for a document chat application. It creates a background thread to index all documents using DocumentIndexer, stores task metadata in a global active_tasks dictionary for progress tracking, and returns a task_id to the client for polling status. The function is designed for web applications where document indexing is time-consuming and should not block the HTTP response. It includes user authentication via login_required decorator and associates tasks with the current user.

Source Code

def index_all_documents():
    """Start background indexing of all documents"""
    try:
        task_id = str(uuid_module.uuid4())
        
        # Store task info
        active_tasks[task_id] = {
            'status': 'processing',
            'progress': 'Starting indexing...',
            'user': get_current_username()
        }
        
        # Start indexing in background (simplified - in production use Celery)
        import threading
        
        def run_indexing():
            try:
                indexer = DocumentIndexer()
                results = indexer.index_documents()
                
                active_tasks[task_id] = {
                    'status': 'completed',
                    'progress': 'Indexing completed',
                    'results': results
                }
            except Exception as e:
                logger.error(f"Indexing error: {e}")
                active_tasks[task_id] = {
                    'status': 'failed',
                    'progress': f'Error: {str(e)}'
                }
        
        thread = threading.Thread(target=run_indexing, daemon=True)
        thread.start()
        
        return jsonify({'task_id': task_id})
        
    except Exception as e:
        logger.error(f"Error starting indexing: {e}")
        return jsonify({'error': str(e)}), 500

Return Value

Returns a Flask JSON response. On success: {'task_id': '<uuid-string>'} with HTTP 200 status. On error: {'error': '<error-message>'} with HTTP 500 status. The task_id can be used to poll the active_tasks dictionary for indexing progress and results.

Dependencies

  • flask
  • flask_login
  • uuid
  • logging
  • threading
  • werkzeug

Required Imports

from flask import Blueprint, jsonify
from flask_login import login_required
import uuid as uuid_module
import threading
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

from document_indexer import DocumentIndexer

Condition: Required custom module for document indexing functionality - must be available in the application

Required (conditional)
logger = logging.getLogger(__name__)

Condition: Logger instance must be configured in the module scope for error logging

Required (conditional)

Usage Example

# Assuming Flask app setup with authentication
# POST request to /api/index-all endpoint

import requests

# Client-side usage
response = requests.post(
    'http://localhost:5000/api/index-all',
    headers={'Authorization': 'Bearer <token>'},
    cookies={'session': '<session-cookie>'}
)

if response.status_code == 200:
    task_id = response.json()['task_id']
    print(f"Indexing started with task_id: {task_id}")
    
    # Poll for status
    status_response = requests.get(
        f'http://localhost:5000/api/task-status/{task_id}'
    )
    task_info = status_response.json()
    print(f"Status: {task_info['status']}")
    print(f"Progress: {task_info['progress']}")
else:
    print(f"Error: {response.json()['error']}")

Best Practices

  • This implementation uses threading.Thread with daemon=True, which is suitable for development but NOT recommended for production - use Celery, RQ, or similar task queue systems instead
  • The active_tasks dictionary is stored in memory and will be lost on server restart - consider using Redis or a database for persistent task tracking
  • No cleanup mechanism exists for completed tasks in active_tasks - implement a cleanup strategy to prevent memory leaks
  • Thread safety is not guaranteed for active_tasks dictionary access - consider using threading.Lock or thread-safe data structures
  • Error handling captures exceptions but doesn't provide detailed error context - consider adding more specific exception handling
  • The function assumes DocumentIndexer().index_documents() is thread-safe - verify this before production use
  • No timeout mechanism exists for long-running indexing tasks - consider implementing task timeouts
  • The task_id is generated using uuid4 which is secure, but ensure active_tasks doesn't grow unbounded
  • Consider adding rate limiting to prevent multiple simultaneous indexing operations from the same user

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function api_index_folder 86.4% similar

    Flask API endpoint that initiates a background task to index documents in a specified folder, tracking progress and returning a task ID for status monitoring.

    From: /tf/active/vicechatdev/docchat/app.py
  • function api_index_progress 73.4% similar

    Flask API endpoint that retrieves the current progress status of an asynchronous indexing task by its task ID.

    From: /tf/active/vicechatdev/docchat/app.py
  • function get_task_status 68.4% similar

    Flask API endpoint that retrieves the current status of a background task by its task ID from an in-memory active_tasks dictionary.

    From: /tf/active/vicechatdev/docchat/blueprint.py
  • function api_documents 67.2% similar

    Flask API endpoint that retrieves statistics and metadata about indexed documents from a document indexer service.

    From: /tf/active/vicechatdev/docchat/app.py
  • function api_task_status 67.0% similar

    Flask API endpoint that retrieves and returns the status of asynchronous tasks (chat or indexing operations) by task ID.

    From: /tf/active/vicechatdev/docchat/app.py
← Back to Browse