🔍 Code Extractor

function init_engines

Maturity: 48

Initializes the RAG (Retrieval-Augmented Generation) engine and document indexer components, loads persisted sessions, and optionally starts background auto-indexing of documents.

File:
/tf/active/vicechatdev/docchat/app.py
Lines:
636 - 701
Complexity:
moderate

Purpose

This initialization function sets up the core components of a document chat system. It creates global instances of DocChatRAG and DocumentIndexer, loads any previously saved user sessions, and conditionally triggers automatic document indexing in a background thread if configured. The function includes comprehensive error handling with logging and traceback reporting for debugging initialization failures.

Source Code

def init_engines():
    """Initialize RAG engine and document indexer"""
    global rag_engine, document_indexer
    
    # Load persisted sessions
    load_all_sessions()
    
    try:
        rag_engine = DocChatRAG()
        logger.info("✓ RAG engine initialized")
    except Exception as e:
        logger.error(f"Failed to initialize RAG engine: {e}")
        try:
            import traceback
            logger.error(f"Traceback: {traceback.format_exc()}")
        except:
            pass
    
    try:
        document_indexer = DocumentIndexer()
        logger.info("✓ Document indexer initialized")
    except Exception as e:
        logger.error(f"Failed to initialize document indexer: {e}")
        try:
            import traceback
            logger.error(f"Traceback: {traceback.format_exc()}")
        except:
            pass
    
    # Auto-index documents on startup if enabled
    if config.AUTO_INDEX_ON_STARTUP and document_indexer:
        logger.info(f"Auto-indexing enabled. Scanning folder: {config.DOCUMENT_FOLDER}")
        
        def auto_index():
            """Background task to auto-index documents on startup"""
            try:
                if config.DOCUMENT_FOLDER.exists():
                    logger.info("Starting automatic document indexing...")
                    results = document_indexer.index_folder(
                        config.DOCUMENT_FOLDER, 
                        recursive=True, 
                        force_reindex=False
                    )
                    
                    new_docs = results['success'] - results['reindexed']
                    logger.info(
                        f"Auto-indexing complete: "
                        f"{new_docs} new, "
                        f"{results['reindexed']} re-indexed, "
                        f"{results['skipped']} skipped, "
                        f"{results['failed']} failed"
                    )
                else:
                    logger.info(f"Document folder does not exist yet: {config.DOCUMENT_FOLDER}")
            except Exception as e:
                logger.error(f"CRITICAL: Error during auto-indexing: {e}")
                try:
                    import traceback
                    logger.error(f"Traceback: {traceback.format_exc()}")
                except:
                    pass
        
        # Start auto-indexing in background thread
        index_thread = threading.Thread(target=auto_index, daemon=True)
        index_thread.start()
        logger.info("✓ Auto-indexing started in background")

Return Value

This function does not return any value (implicitly returns None). It modifies global variables 'rag_engine' and 'document_indexer' as side effects.

Dependencies

  • flask
  • logging
  • datetime
  • uuid
  • pathlib
  • threading
  • werkzeug
  • functools
  • traceback
  • json
  • os
  • time
  • python-docx
  • reportlab

Required Imports

import logging
import threading
import traceback
import config
from rag_engine import DocChatRAG
from document_indexer import DocumentIndexer

Conditional/Optional Imports

These imports are only needed under specific conditions:

import traceback

Condition: imported inside exception handlers for detailed error logging

Optional

Usage Example

# Declare global variables first
rag_engine = None
document_indexer = None

# Configure logger
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Define load_all_sessions function
def load_all_sessions():
    # Load persisted session data
    pass

# Configure settings in config module
import config
from pathlib import Path
config.AUTO_INDEX_ON_STARTUP = True
config.DOCUMENT_FOLDER = Path('./documents')

# Import required classes
from rag_engine import DocChatRAG
from document_indexer import DocumentIndexer

# Initialize engines
init_engines()

# Now rag_engine and document_indexer are available globally
if rag_engine:
    response = rag_engine.query('What is in the documents?')
    print(response)

Best Practices

  • Ensure global variables 'rag_engine' and 'document_indexer' are declared before calling this function
  • Configure all required settings in the config module before initialization
  • The function uses daemon threads for background indexing, which will terminate when the main program exits
  • Error handling is comprehensive but failures are logged rather than raised, allowing partial initialization
  • Auto-indexing runs in background to avoid blocking application startup
  • The function modifies global state, so it should only be called once during application startup
  • Ensure the DOCUMENT_FOLDER path exists or handle the case where it doesn't
  • Consider the performance impact of auto-indexing large document collections on startup
  • The function uses force_reindex=False to avoid re-indexing unchanged documents

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function init_chat_engine_v1 76.0% similar

    Initializes a global chat engine instance using OneCo_hybrid_RAG and validates its configuration by checking for required attributes like available_collections and data_handles.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function init_chat_engine 71.3% similar

    Initializes a global chat engine instance using the OneCo_hybrid_RAG class and logs the initialization status.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function basic_rag_example 69.8% similar

    Demonstrates a basic RAG (Retrieval-Augmented Generation) workflow by initializing a DocChatRAG engine, executing a sample query about document topics, and displaying the response with metadata.

    From: /tf/active/vicechatdev/docchat/example_usage.py
  • function process_chat_background 68.2% similar

    Processes chat requests asynchronously in a background thread, managing RAG engine interactions, progress updates, and session state for various query modes including basic, extensive, full_reading, and deep_reflection.

    From: /tf/active/vicechatdev/docchat/app.py
  • class DocChatRAG 67.1% similar

    Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

    From: /tf/active/vicechatdev/docchat/rag_engine.py
← Back to Browse