function index_documents_example
A demonstration function that indexes documents from a specified folder using a DocumentIndexer, creating the folder if it doesn't exist, and displays indexing results and collection statistics.
/tf/active/vicechatdev/docchat/example_usage.py
11 - 43
simple
Purpose
This function serves as an example/tutorial for how to use the DocumentIndexer class to index documents from a folder. It demonstrates the complete workflow: initializing the indexer, checking/creating the documents directory, indexing all files recursively, and displaying comprehensive statistics about the indexing operation and resulting collection. It's designed as a standalone example that can be run to understand the document indexing process.
Source Code
def index_documents_example():
"""Example: Index documents from a folder"""
print("=== Indexing Documents ===\n")
# Initialize indexer
indexer = DocumentIndexer()
# Index a folder (change this path to your documents)
document_folder = Path(config.DOCUMENTS_DIR)
if not document_folder.exists():
print(f"Creating example folder: {document_folder}")
document_folder.mkdir(parents=True, exist_ok=True)
print("Add some documents to this folder and run again!")
return
# Index all documents
print(f"Indexing documents from: {document_folder}")
results = indexer.index_folder(document_folder, recursive=True)
print(f"\nResults:")
print(f" Total files: {results['total']}")
print(f" Successfully indexed: {results['success']}")
print(f" Failed: {results['failed']}")
# Show collection stats
stats = indexer.get_collection_stats()
print(f"\nCollection Statistics:")
print(f" Total documents: {stats['total_documents']}")
print(f" Total chunks: {stats['total_chunks']}")
print(f" Files: {', '.join(stats['file_names'][:5])}")
if len(stats['file_names']) > 5:
print(f" ... and {len(stats['file_names']) - 5} more")
Return Value
This function does not return any value (implicitly returns None). It performs side effects by printing status messages, creating directories if needed, indexing documents, and displaying results to stdout.
Dependencies
pathlibdocument_indexerrag_engineconfig
Required Imports
from pathlib import Path
from document_indexer import DocumentIndexer
import config
Usage Example
# Ensure config.py has DOCUMENTS_DIR defined
# config.py:
# DOCUMENTS_DIR = './documents'
from pathlib import Path
from document_indexer import DocumentIndexer
import config
def index_documents_example():
"""Example: Index documents from a folder"""
print("=== Indexing Documents ===")
indexer = DocumentIndexer()
document_folder = Path(config.DOCUMENTS_DIR)
if not document_folder.exists():
print(f"Creating example folder: {document_folder}")
document_folder.mkdir(parents=True, exist_ok=True)
print("Add some documents to this folder and run again!")
return
print(f"Indexing documents from: {document_folder}")
results = indexer.index_folder(document_folder, recursive=True)
print(f"\nResults:")
print(f" Total files: {results['total']}")
print(f" Successfully indexed: {results['success']}")
print(f" Failed: {results['failed']}")
stats = indexer.get_collection_stats()
print(f"\nCollection Statistics:")
print(f" Total documents: {stats['total_documents']}")
print(f" Total chunks: {stats['total_chunks']}")
print(f" Files: {', '.join(stats['file_names'][:5])}")
if len(stats['file_names']) > 5:
print(f" ... and {len(stats['file_names']) - 5} more")
# Run the example
index_documents_example()
Best Practices
- Ensure the config.DOCUMENTS_DIR path is properly configured before running this function
- Place actual documents in the specified folder before running to see meaningful results
- The function creates directories with parents=True and exist_ok=True, which is safe for repeated execution
- The recursive=True parameter means all subdirectories will be scanned for documents
- This is an example function meant for demonstration; in production, consider adding error handling and logging
- The function displays only the first 5 file names to avoid cluttering output with large collections
- Ensure DocumentIndexer is properly initialized with required credentials (API keys, database connections, etc.) before use
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_incremental_indexing 65.9% similar
-
class DocumentIndexer 61.8% similar
-
function get_document_info 59.3% similar
-
function main_v45 58.8% similar
-
function init_engines 56.0% similar