function test_single_vendor
Tests vendor enrichment by querying a RAG (Retrieval-Augmented Generation) system to find official contact information (email and VAT number) for a specified vendor using document search and web search capabilities.
/tf/active/vicechatdev/find_email/test_enrichment.py
37 - 134
complex
Purpose
This function serves as a testing utility for vendor data enrichment workflows. It initializes a hybrid RAG engine, configures it to search both ChromaDB collections and web sources, and executes a structured query to retrieve official business contact information (prioritizing global corporate emails and VAT/Tax ID numbers) for a given vendor. The function is designed to validate the RAG system's ability to extract verified business information from multiple sources with confidence scoring.
Source Code
def test_single_vendor(vendor_name="Acme Corporation", collection_name="00_company_governance"):
"""Test enrichment for a single vendor"""
logger.info(f"\nTesting vendor enrichment for: {vendor_name}")
logger.info(f"Using collection: {collection_name}")
logger.info("=" * 60)
try:
# Initialize RAG engine
logger.info("Initializing RAG engine...")
rag = OneCo_hybrid_RAG()
# Configure for document + web search
rag.flow_control.update({
"model": ["OpenAi", "gpt-4o", 0],
"pre_model": ["OpenAi", "gpt-4o-mini", 0],
"enable_search": True,
"enable_web_search": True,
"web_search_queries": 5,
"web_search_cycles": 1,
"enable_memory": False,
"detail_level": "Comprehensive"
})
# **CRITICAL: Add ChromaDB collection to search**
logger.info(f"Adding ChromaDB collection: {collection_name}")
rag.data_handles.add_data(
name="company_docs",
type="chromaDB",
data=collection_name,
inclusions=10,
instructions="Search for official company contact information including email addresses and VAT/Tax ID numbers."
)
logger.info("RAG engine configured ✓")
logger.info(f"✓ Collection '{collection_name}' configured for search")
# Create query
query = f"""Find official contact information for {vendor_name}:
CRITICAL REQUIREMENTS:
1. **OFFICIAL EMAIL ADDRESS** - Find the primary, general-purpose business email:
- PRIORITY: Global corporate emails (from main .com domain) over regional/country-specific emails
- Preferred formats in order: "info@", "contact@", "sales@", "support@"
- AVOID: Regional emails (e.g., *india@, *uk@, *pscs*@), individual employee emails
- Must be from official company website or verified business directory
- Provide the EXACT email address format: name@domain.com
- The email should be suitable for general international business inquiries
2. **VAT NUMBER** (Tax ID / VAT Registration Number / BTW Number):
- Look for official VAT registration number
- Include the full number with country prefix if available
3. **SOURCE VERIFICATION**:
- Official company website (highest priority)
- Government business registers
- Verified business directories
OUTPUT FORMAT:
- Email: [exact email address with @ and domain]
- VAT Number: [complete VAT/Tax ID]
- Source: [specific source]
- Confidence: [High/Medium/Low]"""
logger.info("Executing search...")
response = rag.response_callback(query)
# Extract response - handle various response types
if hasattr(response, 'content'):
response_text = response.content
elif hasattr(response, 'object'):
# Panel Markdown object
response_text = str(response.object)
else:
response_text = str(response)
# Clean up if it's still a string representation of Markdown
if response_text.startswith('Markdown('):
# Extract the actual text from Markdown(str) format
import re
match = re.search(r"Markdown\(['\"](.+?)['\"]\)", response_text, re.DOTALL)
if match:
response_text = match.group(1)
else:
# Fallback: just remove Markdown() wrapper
response_text = response_text.replace('Markdown(str)', '').strip()
logger.info("\n" + "=" * 60)
logger.info("RESPONSE:")
logger.info("=" * 60)
logger.info(response_text)
logger.info("=" * 60)
return True
except Exception as e:
logger.error(f"Error during test: {e}", exc_info=True)
return False
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
vendor_name |
- | 'Acme Corporation' | positional_or_keyword |
collection_name |
- | '00_company_governance' | positional_or_keyword |
Parameter Details
vendor_name: The name of the vendor/company to search for. Default is 'Acme Corporation'. Should be a string containing the full legal or commonly known business name. This parameter is used to construct the search query for finding contact information.
collection_name: The name of the ChromaDB collection to search within. Default is '00_company_governance'. This should be a string matching an existing ChromaDB collection that contains company documents and governance information. The collection is added as a data source to the RAG engine for document-based retrieval.
Return Value
Returns a boolean value: True if the test executes successfully (regardless of whether vendor information is found), False if an exception occurs during execution. The actual vendor information is logged to the logger output rather than returned as a value.
Dependencies
hybrid_rag_engineloggingargparsere
Required Imports
from hybrid_rag_engine import OneCo_hybrid_RAG
import logging
import re
Usage Example
# Ensure logger is configured
import logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Test with default vendor
success = test_single_vendor()
# Test with specific vendor and collection
success = test_single_vendor(
vendor_name="Microsoft Corporation",
collection_name="vendor_database"
)
# Check result
if success:
print("Test completed successfully")
else:
print("Test failed with errors")
Best Practices
- Ensure the logger object is properly initialized in the module scope before calling this function
- Verify that the ChromaDB collection specified in collection_name exists and contains relevant company documents
- The function prioritizes global corporate emails over regional ones - ensure this aligns with your use case
- Monitor API usage as the function makes calls to OpenAI GPT-4o models which can be costly
- The function logs extensively - configure logging appropriately to capture or suppress output as needed
- Handle the boolean return value to detect execution failures, but parse logs for actual vendor information
- The web_search_queries and web_search_cycles parameters are hardcoded (5 and 1) - modify the source if different values are needed
- The function uses exception handling with exc_info=True for detailed error logging - ensure sensitive information isn't exposed in logs
- Response parsing handles multiple response types (content attribute, Panel Markdown objects) - this makes it robust but may need updates if RAG engine response format changes
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class VendorEnricher 83.3% similar
-
function main_v14 72.6% similar
-
function main_v47 71.8% similar
-
function run_all_tests 59.6% similar
-
function test_email_search 59.2% similar