index_documents_example - Code Extractor

function index_documents_example

Maturity: 44

A demonstration function that indexes documents from a specified folder using a DocumentIndexer, creating the folder if it doesn't exist, and displays indexing results and collection statistics.

File:
/tf/active/vicechatdev/docchat/example_usage.py

Lines:
11 - 43

Complexity:
simple

Purpose

This function serves as an example/tutorial for how to use the DocumentIndexer class to index documents from a folder. It demonstrates the complete workflow: initializing the indexer, checking/creating the documents directory, indexing all files recursively, and displaying comprehensive statistics about the indexing operation and resulting collection. It's designed as a standalone example that can be run to understand the document indexing process.

Source Code

def index_documents_example():
    """Example: Index documents from a folder"""
    print("=== Indexing Documents ===\n")
    
    # Initialize indexer
    indexer = DocumentIndexer()
    
    # Index a folder (change this path to your documents)
    document_folder = Path(config.DOCUMENTS_DIR)
    
    if not document_folder.exists():
        print(f"Creating example folder: {document_folder}")
        document_folder.mkdir(parents=True, exist_ok=True)
        print("Add some documents to this folder and run again!")
        return
    
    # Index all documents
    print(f"Indexing documents from: {document_folder}")
    results = indexer.index_folder(document_folder, recursive=True)
    
    print(f"\nResults:")
    print(f"  Total files: {results['total']}")
    print(f"  Successfully indexed: {results['success']}")
    print(f"  Failed: {results['failed']}")
    
    # Show collection stats
    stats = indexer.get_collection_stats()
    print(f"\nCollection Statistics:")
    print(f"  Total documents: {stats['total_documents']}")
    print(f"  Total chunks: {stats['total_chunks']}")
    print(f"  Files: {', '.join(stats['file_names'][:5])}")
    if len(stats['file_names']) > 5:
        print(f"  ... and {len(stats['file_names']) - 5} more")

Return Value

This function does not return any value (implicitly returns None). It performs side effects by printing status messages, creating directories if needed, indexing documents, and displaying results to stdout.

Dependencies

pathlib
document_indexer
rag_engine
config

Required Imports

from pathlib import Path
from document_indexer import DocumentIndexer
import config

Usage Example

# Ensure config.py has DOCUMENTS_DIR defined
# config.py:
# DOCUMENTS_DIR = './documents'

from pathlib import Path
from document_indexer import DocumentIndexer
import config

def index_documents_example():
    """Example: Index documents from a folder"""
    print("=== Indexing Documents ===")
    indexer = DocumentIndexer()
    document_folder = Path(config.DOCUMENTS_DIR)
    if not document_folder.exists():
        print(f"Creating example folder: {document_folder}")
        document_folder.mkdir(parents=True, exist_ok=True)
        print("Add some documents to this folder and run again!")
        return
    print(f"Indexing documents from: {document_folder}")
    results = indexer.index_folder(document_folder, recursive=True)
    print(f"\nResults:")
    print(f"  Total files: {results['total']}")
    print(f"  Successfully indexed: {results['success']}")
    print(f"  Failed: {results['failed']}")
    stats = indexer.get_collection_stats()
    print(f"\nCollection Statistics:")
    print(f"  Total documents: {stats['total_documents']}")
    print(f"  Total chunks: {stats['total_chunks']}")
    print(f"  Files: {', '.join(stats['file_names'][:5])}")
    if len(stats['file_names']) > 5:
        print(f"  ... and {len(stats['file_names']) - 5} more")

# Run the example
index_documents_example()

Best Practices

Ensure the config.DOCUMENTS_DIR path is properly configured before running this function
Place actual documents in the specified folder before running to see meaningful results
The function creates directories with parents=True and exist_ok=True, which is safe for repeated execution
The recursive=True parameter means all subdirectories will be scanned for documents
This is an example function meant for demonstration; in production, consider adding error handling and logging
The function displays only the first 5 file names to avoid cluttering output with large collections
Ensure DocumentIndexer is properly initialized with required credentials (API keys, database connections, etc.) before use

Similar Components

AI-powered semantic similarity - components with related functionality:

function test_incremental_indexing 65.9% similar

Comprehensive test function that validates incremental indexing functionality of a document indexing system, including initial indexing, change detection, re-indexing, and force re-indexing scenarios.
From: /tf/active/vicechatdev/docchat/test_incremental_indexing.py
class DocumentIndexer 61.8% similar

A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.
From: /tf/active/vicechatdev/docchat/document_indexer.py
function get_document_info 59.3% similar

Retrieves indexing status and metadata for a document, including whether it's indexed, its document ID, chunk count, and reindexing status.
From: /tf/active/vicechatdev/docchat/app.py
function main_v45 58.8% similar

Orchestrates and executes a series of example demonstrations for the DocChat system, including document indexing, RAG queries, and conversation modes.
From: /tf/active/vicechatdev/docchat/example_usage.py
function init_engines 56.0% similar

Initializes the RAG (Retrieval-Augmented Generation) engine and document indexer components, loads persisted sessions, and optionally starts background auto-indexing of documents.
From: /tf/active/vicechatdev/docchat/app.py

🔍 Code Extractor

function index_documents_example

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_incremental_indexing 65.9% similar

class DocumentIndexer 61.8% similar

function get_document_info 59.3% similar

function main_v45 58.8% similar

function init_engines 56.0% similar

function index_documents_example

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_incremental_indexing 65.9% similar

class DocumentIndexer 61.8% similar

function get_document_info 59.3% similar

function main_v45 58.8% similar

function init_engines 56.0% similar

✨ Improve Code: index_documents_example

Code Comparison