🔍 Code Extractor

function index_documents_example

Maturity: 44

A demonstration function that indexes documents from a specified folder using a DocumentIndexer, creating the folder if it doesn't exist, and displays indexing results and collection statistics.

File:
/tf/active/vicechatdev/docchat/example_usage.py
Lines:
11 - 43
Complexity:
simple

Purpose

This function serves as an example/tutorial for how to use the DocumentIndexer class to index documents from a folder. It demonstrates the complete workflow: initializing the indexer, checking/creating the documents directory, indexing all files recursively, and displaying comprehensive statistics about the indexing operation and resulting collection. It's designed as a standalone example that can be run to understand the document indexing process.

Source Code

def index_documents_example():
    """Example: Index documents from a folder"""
    print("=== Indexing Documents ===\n")
    
    # Initialize indexer
    indexer = DocumentIndexer()
    
    # Index a folder (change this path to your documents)
    document_folder = Path(config.DOCUMENTS_DIR)
    
    if not document_folder.exists():
        print(f"Creating example folder: {document_folder}")
        document_folder.mkdir(parents=True, exist_ok=True)
        print("Add some documents to this folder and run again!")
        return
    
    # Index all documents
    print(f"Indexing documents from: {document_folder}")
    results = indexer.index_folder(document_folder, recursive=True)
    
    print(f"\nResults:")
    print(f"  Total files: {results['total']}")
    print(f"  Successfully indexed: {results['success']}")
    print(f"  Failed: {results['failed']}")
    
    # Show collection stats
    stats = indexer.get_collection_stats()
    print(f"\nCollection Statistics:")
    print(f"  Total documents: {stats['total_documents']}")
    print(f"  Total chunks: {stats['total_chunks']}")
    print(f"  Files: {', '.join(stats['file_names'][:5])}")
    if len(stats['file_names']) > 5:
        print(f"  ... and {len(stats['file_names']) - 5} more")

Return Value

This function does not return any value (implicitly returns None). It performs side effects by printing status messages, creating directories if needed, indexing documents, and displaying results to stdout.

Dependencies

  • pathlib
  • document_indexer
  • rag_engine
  • config

Required Imports

from pathlib import Path
from document_indexer import DocumentIndexer
import config

Usage Example

# Ensure config.py has DOCUMENTS_DIR defined
# config.py:
# DOCUMENTS_DIR = './documents'

from pathlib import Path
from document_indexer import DocumentIndexer
import config

def index_documents_example():
    """Example: Index documents from a folder"""
    print("=== Indexing Documents ===")
    indexer = DocumentIndexer()
    document_folder = Path(config.DOCUMENTS_DIR)
    if not document_folder.exists():
        print(f"Creating example folder: {document_folder}")
        document_folder.mkdir(parents=True, exist_ok=True)
        print("Add some documents to this folder and run again!")
        return
    print(f"Indexing documents from: {document_folder}")
    results = indexer.index_folder(document_folder, recursive=True)
    print(f"\nResults:")
    print(f"  Total files: {results['total']}")
    print(f"  Successfully indexed: {results['success']}")
    print(f"  Failed: {results['failed']}")
    stats = indexer.get_collection_stats()
    print(f"\nCollection Statistics:")
    print(f"  Total documents: {stats['total_documents']}")
    print(f"  Total chunks: {stats['total_chunks']}")
    print(f"  Files: {', '.join(stats['file_names'][:5])}")
    if len(stats['file_names']) > 5:
        print(f"  ... and {len(stats['file_names']) - 5} more")

# Run the example
index_documents_example()

Best Practices

  • Ensure the config.DOCUMENTS_DIR path is properly configured before running this function
  • Place actual documents in the specified folder before running to see meaningful results
  • The function creates directories with parents=True and exist_ok=True, which is safe for repeated execution
  • The recursive=True parameter means all subdirectories will be scanned for documents
  • This is an example function meant for demonstration; in production, consider adding error handling and logging
  • The function displays only the first 5 file names to avoid cluttering output with large collections
  • Ensure DocumentIndexer is properly initialized with required credentials (API keys, database connections, etc.) before use

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_incremental_indexing 65.9% similar

    Comprehensive test function that validates incremental indexing functionality of a document indexing system, including initial indexing, change detection, re-indexing, and force re-indexing scenarios.

    From: /tf/active/vicechatdev/docchat/test_incremental_indexing.py
  • class DocumentIndexer 61.8% similar

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    From: /tf/active/vicechatdev/docchat/document_indexer.py
  • function get_document_info 59.3% similar

    Retrieves indexing status and metadata for a document, including whether it's indexed, its document ID, chunk count, and reindexing status.

    From: /tf/active/vicechatdev/docchat/app.py
  • function main_v45 58.8% similar

    Orchestrates and executes a series of example demonstrations for the DocChat system, including document indexing, RAG queries, and conversation modes.

    From: /tf/active/vicechatdev/docchat/example_usage.py
  • function init_engines 56.0% similar

    Initializes the RAG (Retrieval-Augmented Generation) engine and document indexer components, loads persisted sessions, and optionally starts background auto-indexing of documents.

    From: /tf/active/vicechatdev/docchat/app.py
← Back to Browse