🔍 Code Extractor

function test_llm_extraction

Maturity: 46

A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.

File:
/tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
Lines:
67 - 161
Complexity:
moderate

Purpose

This function serves as an integration test for the ContractDataExtractor's LLM extraction capabilities. It initializes the extractor with a configuration, processes a sample contract (SAMPLE_CONTRACT), extracts structured data using LLM, and validates critical fields like vendor name, effective date, contract status, and currency. The function provides detailed console output showing extraction results and validation status, making it useful for debugging and verifying the contract extraction pipeline.

Source Code

def test_llm_extraction():
    """Test LLM extraction with sample contract"""
    print("="*80)
    print("Testing Contract Data Extractor - LLM Extraction")
    print("="*80)
    print()
    
    try:
        # Load config
        config = Config()
        
        # Create extractor (with limit to avoid FileCloud connection)
        extractor = ContractDataExtractor(config, limit=1)
        
        print("✅ Extractor initialized")
        print()
        
        # Test LLM extraction directly
        print("Extracting data from sample contract...")
        print("-" * 80)
        print(SAMPLE_CONTRACT[:500] + "...")
        print("-" * 80)
        print()
        
        # Extract contract data
        result = extractor.extract_contract_data_with_llm(
            SAMPLE_CONTRACT,
            "MSA_360Biolabs_Sample.pdf"
        )
        
        if result:
            print("✅ Extraction successful!")
            print()
            print("Extracted Data:")
            print("="*80)
            
            # Display key fields
            key_fields = [
                'vendor_name',
                'vicebio_contracting_party',
                'contract_type',
                'effective_date',
                'contract_duration',
                'end_date_auto_renewal',
                'contract_status',
                'estimated_amount_original_currency',
                'currency',
                'term_and_termination',
                'ip_exclusivity',
                'assignment_options'
            ]
            
            for field in key_fields:
                value = result.get(field, 'Not extracted')
                print(f"{field:40} : {value}")
            
            print("="*80)
            print()
            
            # Validate critical fields
            print("Validation:")
            print("-" * 80)
            
            checks = {
                'Vendor Name': result.get('vendor_name') == '360Biolabs Pty Ltd',
                'ViceBio Party': 'ViceBio Limited' in str(result.get('vicebio_contracting_party', '')),
                'Effective Date': '10/03/2023' in str(result.get('effective_date', '')) or '03/10/2023' in str(result.get('effective_date', '')),
                'End Date Calculated': result.get('end_date_auto_renewal') is not None,
                'Contract Status': result.get('contract_status') in ['active', 'expired'],
                'Amount': result.get('estimated_amount_original_currency') is not None,
                'Currency': result.get('currency') == 'AUD'
            }
            
            for check, passed in checks.items():
                status = "✅ PASS" if passed else "❌ FAIL"
                print(f"{status} : {check}")
            
            print("-" * 80)
            print()
            
            if all(checks.values()):
                print("🎉 All validation checks passed!")
            else:
                print("⚠️  Some validation checks failed")
            
            return True
        else:
            print("❌ Extraction failed - no data returned")
            return False
            
    except Exception as e:
        print(f"❌ Error: {e}")
        import traceback
        traceback.print_exc()
        return False

Return Value

Returns a boolean value: True if the extraction was successful and data was returned (regardless of validation results), False if extraction failed or an exception occurred. The function also prints comprehensive diagnostic information to stdout including extraction results, field values, and validation check outcomes.

Dependencies

  • sys
  • pathlib
  • traceback

Required Imports

import sys
from pathlib import Path
from extractor import ContractDataExtractor
from config.config import Config
import traceback

Usage Example

# Ensure SAMPLE_CONTRACT is defined
SAMPLE_CONTRACT = """Master Services Agreement between ViceBio Limited and 360Biolabs Pty Ltd..."""

# Run the test
if __name__ == '__main__':
    success = test_llm_extraction()
    if success:
        print('Test completed successfully')
    else:
        print('Test failed')
        sys.exit(1)

Best Practices

  • This function requires SAMPLE_CONTRACT to be defined as a module-level constant containing sample contract text
  • The function uses limit=1 when initializing ContractDataExtractor to avoid FileCloud connections during testing
  • Validation checks are hardcoded for specific expected values (e.g., '360Biolabs Pty Ltd', 'AUD') - modify these if testing with different sample contracts
  • The function prints extensive output to stdout, making it suitable for manual testing but not for automated test suites without output capture
  • Error handling includes full traceback printing for debugging purposes
  • The function validates both date formats ('10/03/2023' and '03/10/2023') to handle different date parsing results
  • Consider wrapping this in a proper unit test framework (pytest, unittest) for production use
  • The function returns True even if validation checks fail, as long as extraction completes - check validation output for actual success

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_llm_client 83.6% similar

    Tests the LLM client functionality by analyzing a sample contract text and verifying the extraction of key contract metadata such as third parties, dates, and status.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py
  • function test_new_fields 78.2% similar

    A test function that validates an LLM client's ability to extract third-party email addresses and tax identification numbers from contract documents.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_new_fields.py
  • class ContractDataExtractor 75.4% similar

    Extract structured data from legal contracts using LLM analysis

    From: /tf/active/vicechatdev/contract_validity_analyzer/extractor.py
  • function test_end_date_extraction 73.0% similar

    Tests end date extraction functionality for contract documents that previously had missing end dates by downloading documents from FileCloud, extracting text, analyzing with LLM, and comparing results.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_missing_end_dates.py
  • function test_with_simulated_content 72.2% similar

    Tests LLM-based contract analysis prompts using simulated NDA content containing a term clause to verify extraction of contract dates and metadata.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py
← Back to Browse