šŸ” Code Extractor

function main_v15

Maturity: 48

Command-line interface function that orchestrates the enrichment of vendor data from an Excel file with email and VAT information using ChromaDB and RAG engine.

File:
/tf/active/vicechatdev/find_email/vendor_enrichment.py
Lines:
418 - 468
Complexity:
moderate

Purpose

This is the main entry point for a vendor data enrichment application. It parses command-line arguments to configure the enrichment process, loads vendor data from an Excel file, searches a ChromaDB collection for relevant information, enriches vendor records with email and VAT details, and saves the results. The function supports resumable processing with configurable start/end indices and request delays to manage API rate limits.

Source Code

def main():
    """Main execution function"""
    import argparse
    
    parser = argparse.ArgumentParser(description='Enrich vendor list with email and VAT information')
    parser.add_argument('--file', type=str, 
                       default='Vendors list_Vicebio_20112025_VB comments v1.xlsx',
                       help='Excel file with vendor list')
    parser.add_argument('--collection', type=str, 
                       default='00_company_governance',
                       help='ChromaDB collection to search')
    parser.add_argument('--start', type=int, default=0,
                       help='Start index (for resuming)')
    parser.add_argument('--end', type=int, default=None,
                       help='End index (None = all)')
    parser.add_argument('--delay', type=int, default=2,
                       help='Delay between requests (seconds)')
    
    args = parser.parse_args()
    
    # Build full path
    script_dir = os.path.dirname(os.path.abspath(__file__))
    excel_path = os.path.join(script_dir, args.file)
    
    if not os.path.exists(excel_path):
        logger.error(f"Excel file not found: {excel_path}")
        return
    
    # Create enricher
    enricher = VendorEnricher(excel_path, args.collection)
    
    # Load vendors
    if not enricher.load_vendors():
        logger.error("Failed to load vendor data")
        return
    
    # Enrich vendors
    enricher.enrich_all_vendors(
        start_index=args.start,
        end_index=args.end,
        delay=args.delay
    )
    
    # Save final results
    output_file = enricher.save_results()
    
    # Generate summary
    enricher.generate_summary()
    
    if output_file:
        logger.info(f"\nāœ… Enrichment complete! Results saved to: {output_file}")

Return Value

Returns None. The function performs side effects including logging output, saving enriched data to an Excel file, and generating a summary report. Success or failure is communicated through logger messages.

Dependencies

  • argparse
  • os
  • sys
  • pandas
  • json
  • time
  • datetime
  • logging
  • re
  • hybrid_rag_engine

Required Imports

import os
import sys
import pandas as pd
import json
import time
from datetime import datetime
from hybrid_rag_engine import OneCo_hybrid_RAG
import logging
import argparse
import re

Conditional/Optional Imports

These imports are only needed under specific conditions:

import argparse

Condition: always imported at function start for CLI argument parsing

Required (conditional)

Usage Example

# Run from command line with default settings:
# python script.py

# Run with custom parameters:
# python script.py --file my_vendors.xlsx --collection my_collection --start 10 --end 50 --delay 3

# In Python code (if calling programmatically):
if __name__ == '__main__':
    main()

# Resume processing from index 100:
# python script.py --start 100

# Process only first 20 vendors with 5 second delay:
# python script.py --end 20 --delay 5

Best Practices

  • This function should only be called as the main entry point of the script, typically with if __name__ == '__main__': main()
  • Ensure the VendorEnricher class is properly defined with methods: load_vendors(), enrich_all_vendors(), save_results(), and generate_summary()
  • Configure logging before calling this function to capture all log messages
  • Use --start and --end arguments to process data in batches if dealing with large datasets
  • Adjust --delay parameter based on API rate limits to avoid throttling
  • The Excel file must exist in the same directory as the script or provide full path via --file argument
  • Check that the ChromaDB collection exists and is populated before running
  • Monitor disk space as results are saved to Excel files in the script directory
  • The function handles errors gracefully but returns None in all cases, check logs for execution status

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function main_v48 80.3% similar

    Entry point function that orchestrates vendor enrichment testing by parsing command-line arguments, running setup validation, and executing a single vendor test against a ChromaDB collection.

    From: /tf/active/vicechatdev/find_email/test_enrichment.py
  • class VendorEnricher 76.0% similar

    A class that enriches vendor information by finding official email addresses and VAT numbers using RAG (Retrieval-Augmented Generation) with ChromaDB document search and web search capabilities.

    From: /tf/active/vicechatdev/find_email/vendor_enrichment.py
  • function main_v28 73.4% similar

    Command-line entry point that parses arguments and orchestrates the extraction of vendor emails from all vicebio.com mailboxes using Microsoft Graph API.

    From: /tf/active/vicechatdev/find_email/extract_vendor_batch.py
  • function test_single_vendor 72.6% similar

    Tests vendor enrichment by querying a RAG (Retrieval-Augmented Generation) system to find official contact information (email and VAT number) for a specified vendor using document search and web search capabilities.

    From: /tf/active/vicechatdev/find_email/test_enrichment.py
  • function extract_batch 61.1% similar

    Batch processes a list of vendors from an Excel file to extract their email addresses by searching through Microsoft 365 mailboxes using AI-powered email analysis.

    From: /tf/active/vicechatdev/find_email/extract_vendor_batch.py
← Back to Browse