🔍 Code Extractor

function test_single_vendor

Maturity: 48

Tests vendor enrichment by querying a RAG (Retrieval-Augmented Generation) system to find official contact information (email and VAT number) for a specified vendor using document search and web search capabilities.

File:
/tf/active/vicechatdev/find_email/test_enrichment.py
Lines:
37 - 134
Complexity:
complex

Purpose

This function serves as a testing utility for vendor data enrichment workflows. It initializes a hybrid RAG engine, configures it to search both ChromaDB collections and web sources, and executes a structured query to retrieve official business contact information (prioritizing global corporate emails and VAT/Tax ID numbers) for a given vendor. The function is designed to validate the RAG system's ability to extract verified business information from multiple sources with confidence scoring.

Source Code

def test_single_vendor(vendor_name="Acme Corporation", collection_name="00_company_governance"):
    """Test enrichment for a single vendor"""
    logger.info(f"\nTesting vendor enrichment for: {vendor_name}")
    logger.info(f"Using collection: {collection_name}")
    logger.info("=" * 60)
    
    try:
        # Initialize RAG engine
        logger.info("Initializing RAG engine...")
        rag = OneCo_hybrid_RAG()
        
        # Configure for document + web search
        rag.flow_control.update({
            "model": ["OpenAi", "gpt-4o", 0],
            "pre_model": ["OpenAi", "gpt-4o-mini", 0],
            "enable_search": True,
            "enable_web_search": True,
            "web_search_queries": 5,
            "web_search_cycles": 1,
            "enable_memory": False,
            "detail_level": "Comprehensive"
        })
        
        # **CRITICAL: Add ChromaDB collection to search**
        logger.info(f"Adding ChromaDB collection: {collection_name}")
        rag.data_handles.add_data(
            name="company_docs",
            type="chromaDB",
            data=collection_name,
            inclusions=10,
            instructions="Search for official company contact information including email addresses and VAT/Tax ID numbers."
        )
        
        logger.info("RAG engine configured ✓")
        logger.info(f"✓ Collection '{collection_name}' configured for search")
        
        # Create query
        query = f"""Find official contact information for {vendor_name}:

CRITICAL REQUIREMENTS:

1. **OFFICIAL EMAIL ADDRESS** - Find the primary, general-purpose business email:
   - PRIORITY: Global corporate emails (from main .com domain) over regional/country-specific emails
   - Preferred formats in order: "info@", "contact@", "sales@", "support@"
   - AVOID: Regional emails (e.g., *india@, *uk@, *pscs*@), individual employee emails
   - Must be from official company website or verified business directory
   - Provide the EXACT email address format: name@domain.com
   - The email should be suitable for general international business inquiries

2. **VAT NUMBER** (Tax ID / VAT Registration Number / BTW Number):
   - Look for official VAT registration number
   - Include the full number with country prefix if available

3. **SOURCE VERIFICATION**:
   - Official company website (highest priority)
   - Government business registers
   - Verified business directories

OUTPUT FORMAT:
- Email: [exact email address with @ and domain]
- VAT Number: [complete VAT/Tax ID]
- Source: [specific source]
- Confidence: [High/Medium/Low]"""
        
        logger.info("Executing search...")
        response = rag.response_callback(query)
        
        # Extract response - handle various response types
        if hasattr(response, 'content'):
            response_text = response.content
        elif hasattr(response, 'object'):
            # Panel Markdown object
            response_text = str(response.object)
        else:
            response_text = str(response)
        
        # Clean up if it's still a string representation of Markdown
        if response_text.startswith('Markdown('):
            # Extract the actual text from Markdown(str) format
            import re
            match = re.search(r"Markdown\(['\"](.+?)['\"]\)", response_text, re.DOTALL)
            if match:
                response_text = match.group(1)
            else:
                # Fallback: just remove Markdown() wrapper
                response_text = response_text.replace('Markdown(str)', '').strip()
        
        logger.info("\n" + "=" * 60)
        logger.info("RESPONSE:")
        logger.info("=" * 60)
        logger.info(response_text)
        logger.info("=" * 60)
        
        return True
        
    except Exception as e:
        logger.error(f"Error during test: {e}", exc_info=True)
        return False

Parameters

Name Type Default Kind
vendor_name - 'Acme Corporation' positional_or_keyword
collection_name - '00_company_governance' positional_or_keyword

Parameter Details

vendor_name: The name of the vendor/company to search for. Default is 'Acme Corporation'. Should be a string containing the full legal or commonly known business name. This parameter is used to construct the search query for finding contact information.

collection_name: The name of the ChromaDB collection to search within. Default is '00_company_governance'. This should be a string matching an existing ChromaDB collection that contains company documents and governance information. The collection is added as a data source to the RAG engine for document-based retrieval.

Return Value

Returns a boolean value: True if the test executes successfully (regardless of whether vendor information is found), False if an exception occurs during execution. The actual vendor information is logged to the logger output rather than returned as a value.

Dependencies

  • hybrid_rag_engine
  • logging
  • argparse
  • re

Required Imports

from hybrid_rag_engine import OneCo_hybrid_RAG
import logging
import re

Usage Example

# Ensure logger is configured
import logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Test with default vendor
success = test_single_vendor()

# Test with specific vendor and collection
success = test_single_vendor(
    vendor_name="Microsoft Corporation",
    collection_name="vendor_database"
)

# Check result
if success:
    print("Test completed successfully")
else:
    print("Test failed with errors")

Best Practices

  • Ensure the logger object is properly initialized in the module scope before calling this function
  • Verify that the ChromaDB collection specified in collection_name exists and contains relevant company documents
  • The function prioritizes global corporate emails over regional ones - ensure this aligns with your use case
  • Monitor API usage as the function makes calls to OpenAI GPT-4o models which can be costly
  • The function logs extensively - configure logging appropriately to capture or suppress output as needed
  • Handle the boolean return value to detect execution failures, but parse logs for actual vendor information
  • The web_search_queries and web_search_cycles parameters are hardcoded (5 and 1) - modify the source if different values are needed
  • The function uses exception handling with exc_info=True for detailed error logging - ensure sensitive information isn't exposed in logs
  • Response parsing handles multiple response types (content attribute, Panel Markdown objects) - this makes it robust but may need updates if RAG engine response format changes

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class VendorEnricher 83.3% similar

    A class that enriches vendor information by finding official email addresses and VAT numbers using RAG (Retrieval-Augmented Generation) with ChromaDB document search and web search capabilities.

    From: /tf/active/vicechatdev/find_email/vendor_enrichment.py
  • function main_v14 72.6% similar

    Command-line interface function that orchestrates the enrichment of vendor data from an Excel file with email and VAT information using ChromaDB and RAG engine.

    From: /tf/active/vicechatdev/find_email/vendor_enrichment.py
  • function main_v47 71.8% similar

    Entry point function that orchestrates vendor enrichment testing by parsing command-line arguments, running setup validation, and executing a single vendor test against a ChromaDB collection.

    From: /tf/active/vicechatdev/find_email/test_enrichment.py
  • function run_all_tests 59.6% similar

    Orchestrates a comprehensive test suite for the Vendor Email Extractor system, verifying configuration, authentication, mailbox access, email search, and LLM connectivity.

    From: /tf/active/vicechatdev/find_email/test_vendor_extractor.py
  • function test_email_search 59.2% similar

    Tests the email search functionality of a VendorEmailExtractor instance by searching for emails containing common business terms in the first available mailbox.

    From: /tf/active/vicechatdev/find_email/test_vendor_extractor.py
← Back to Browse