function main_v15
Command-line interface function that orchestrates the enrichment of vendor data from an Excel file with email and VAT information using ChromaDB and RAG engine.
/tf/active/vicechatdev/find_email/vendor_enrichment.py
418 - 468
moderate
Purpose
This is the main entry point for a vendor data enrichment application. It parses command-line arguments to configure the enrichment process, loads vendor data from an Excel file, searches a ChromaDB collection for relevant information, enriches vendor records with email and VAT details, and saves the results. The function supports resumable processing with configurable start/end indices and request delays to manage API rate limits.
Source Code
def main():
"""Main execution function"""
import argparse
parser = argparse.ArgumentParser(description='Enrich vendor list with email and VAT information')
parser.add_argument('--file', type=str,
default='Vendors list_Vicebio_20112025_VB comments v1.xlsx',
help='Excel file with vendor list')
parser.add_argument('--collection', type=str,
default='00_company_governance',
help='ChromaDB collection to search')
parser.add_argument('--start', type=int, default=0,
help='Start index (for resuming)')
parser.add_argument('--end', type=int, default=None,
help='End index (None = all)')
parser.add_argument('--delay', type=int, default=2,
help='Delay between requests (seconds)')
args = parser.parse_args()
# Build full path
script_dir = os.path.dirname(os.path.abspath(__file__))
excel_path = os.path.join(script_dir, args.file)
if not os.path.exists(excel_path):
logger.error(f"Excel file not found: {excel_path}")
return
# Create enricher
enricher = VendorEnricher(excel_path, args.collection)
# Load vendors
if not enricher.load_vendors():
logger.error("Failed to load vendor data")
return
# Enrich vendors
enricher.enrich_all_vendors(
start_index=args.start,
end_index=args.end,
delay=args.delay
)
# Save final results
output_file = enricher.save_results()
# Generate summary
enricher.generate_summary()
if output_file:
logger.info(f"\nā
Enrichment complete! Results saved to: {output_file}")
Return Value
Returns None. The function performs side effects including logging output, saving enriched data to an Excel file, and generating a summary report. Success or failure is communicated through logger messages.
Dependencies
argparseossyspandasjsontimedatetimeloggingrehybrid_rag_engine
Required Imports
import os
import sys
import pandas as pd
import json
import time
from datetime import datetime
from hybrid_rag_engine import OneCo_hybrid_RAG
import logging
import argparse
import re
Conditional/Optional Imports
These imports are only needed under specific conditions:
import argparse
Condition: always imported at function start for CLI argument parsing
Required (conditional)Usage Example
# Run from command line with default settings:
# python script.py
# Run with custom parameters:
# python script.py --file my_vendors.xlsx --collection my_collection --start 10 --end 50 --delay 3
# In Python code (if calling programmatically):
if __name__ == '__main__':
main()
# Resume processing from index 100:
# python script.py --start 100
# Process only first 20 vendors with 5 second delay:
# python script.py --end 20 --delay 5
Best Practices
- This function should only be called as the main entry point of the script, typically with if __name__ == '__main__': main()
- Ensure the VendorEnricher class is properly defined with methods: load_vendors(), enrich_all_vendors(), save_results(), and generate_summary()
- Configure logging before calling this function to capture all log messages
- Use --start and --end arguments to process data in batches if dealing with large datasets
- Adjust --delay parameter based on API rate limits to avoid throttling
- The Excel file must exist in the same directory as the script or provide full path via --file argument
- Check that the ChromaDB collection exists and is populated before running
- Monitor disk space as results are saved to Excel files in the script directory
- The function handles errors gracefully but returns None in all cases, check logs for execution status
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function main_v48 80.3% similar
-
class VendorEnricher 76.0% similar
-
function main_v28 73.4% similar
-
function test_single_vendor 72.6% similar
-
function extract_batch 61.1% similar