function test_llm_extraction
A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.
/tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
67 - 161
moderate
Purpose
This function serves as an integration test for the ContractDataExtractor's LLM extraction capabilities. It initializes the extractor with a configuration, processes a sample contract (SAMPLE_CONTRACT), extracts structured data using LLM, and validates critical fields like vendor name, effective date, contract status, and currency. The function provides detailed console output showing extraction results and validation status, making it useful for debugging and verifying the contract extraction pipeline.
Source Code
def test_llm_extraction():
"""Test LLM extraction with sample contract"""
print("="*80)
print("Testing Contract Data Extractor - LLM Extraction")
print("="*80)
print()
try:
# Load config
config = Config()
# Create extractor (with limit to avoid FileCloud connection)
extractor = ContractDataExtractor(config, limit=1)
print("✅ Extractor initialized")
print()
# Test LLM extraction directly
print("Extracting data from sample contract...")
print("-" * 80)
print(SAMPLE_CONTRACT[:500] + "...")
print("-" * 80)
print()
# Extract contract data
result = extractor.extract_contract_data_with_llm(
SAMPLE_CONTRACT,
"MSA_360Biolabs_Sample.pdf"
)
if result:
print("✅ Extraction successful!")
print()
print("Extracted Data:")
print("="*80)
# Display key fields
key_fields = [
'vendor_name',
'vicebio_contracting_party',
'contract_type',
'effective_date',
'contract_duration',
'end_date_auto_renewal',
'contract_status',
'estimated_amount_original_currency',
'currency',
'term_and_termination',
'ip_exclusivity',
'assignment_options'
]
for field in key_fields:
value = result.get(field, 'Not extracted')
print(f"{field:40} : {value}")
print("="*80)
print()
# Validate critical fields
print("Validation:")
print("-" * 80)
checks = {
'Vendor Name': result.get('vendor_name') == '360Biolabs Pty Ltd',
'ViceBio Party': 'ViceBio Limited' in str(result.get('vicebio_contracting_party', '')),
'Effective Date': '10/03/2023' in str(result.get('effective_date', '')) or '03/10/2023' in str(result.get('effective_date', '')),
'End Date Calculated': result.get('end_date_auto_renewal') is not None,
'Contract Status': result.get('contract_status') in ['active', 'expired'],
'Amount': result.get('estimated_amount_original_currency') is not None,
'Currency': result.get('currency') == 'AUD'
}
for check, passed in checks.items():
status = "✅ PASS" if passed else "❌ FAIL"
print(f"{status} : {check}")
print("-" * 80)
print()
if all(checks.values()):
print("🎉 All validation checks passed!")
else:
print("⚠️ Some validation checks failed")
return True
else:
print("❌ Extraction failed - no data returned")
return False
except Exception as e:
print(f"❌ Error: {e}")
import traceback
traceback.print_exc()
return False
Return Value
Returns a boolean value: True if the extraction was successful and data was returned (regardless of validation results), False if extraction failed or an exception occurred. The function also prints comprehensive diagnostic information to stdout including extraction results, field values, and validation check outcomes.
Dependencies
syspathlibtraceback
Required Imports
import sys
from pathlib import Path
from extractor import ContractDataExtractor
from config.config import Config
import traceback
Usage Example
# Ensure SAMPLE_CONTRACT is defined
SAMPLE_CONTRACT = """Master Services Agreement between ViceBio Limited and 360Biolabs Pty Ltd..."""
# Run the test
if __name__ == '__main__':
success = test_llm_extraction()
if success:
print('Test completed successfully')
else:
print('Test failed')
sys.exit(1)
Best Practices
- This function requires SAMPLE_CONTRACT to be defined as a module-level constant containing sample contract text
- The function uses limit=1 when initializing ContractDataExtractor to avoid FileCloud connections during testing
- Validation checks are hardcoded for specific expected values (e.g., '360Biolabs Pty Ltd', 'AUD') - modify these if testing with different sample contracts
- The function prints extensive output to stdout, making it suitable for manual testing but not for automated test suites without output capture
- Error handling includes full traceback printing for debugging purposes
- The function validates both date formats ('10/03/2023' and '03/10/2023') to handle different date parsing results
- Consider wrapping this in a proper unit test framework (pytest, unittest) for production use
- The function returns True even if validation checks fail, as long as extraction completes - check validation output for actual success
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_llm_client 83.6% similar
-
function test_new_fields 78.2% similar
-
class ContractDataExtractor 75.4% similar
-
function test_end_date_extraction 73.0% similar
-
function test_with_simulated_content 72.2% similar