🔍 Code Extractor

function test_edge_cases

Maturity: 46

Tests edge cases and variations in European tax ID formats by analyzing a sample contract document containing Swiss, Norwegian, Swedish, and Danish tax identifiers.

File:
/tf/active/vicechatdev/contract_validity_analyzer/test_international_tax_ids.py
Lines:
165 - 226
Complexity:
moderate

Purpose

This function validates the LLM's ability to extract and recognize diverse European tax ID formats from contract documents. It tests against multiple formatting variations including dots, spaces, and hyphens in tax IDs from Switzerland (CHE), Norway (Org.nr), Sweden (Org.nr), and Denmark (CVR). The function is designed as a test case to ensure robust tax ID extraction across different European jurisdictions and formatting conventions.

Source Code

def test_edge_cases():
    """Test edge cases and variations in tax ID formats."""
    
    test_document = """
    AGREEMENT ADDENDUM
    
    Additional Parties:
    
    Swiss Company AG
    VAT: CHE-123.456.789 MWST
    UID: CHE-123.456.789
    
    Norwegian AS
    Org.nr: 123 456 789
    MVA: NO123456789MVA
    
    Swedish AB  
    Org.nr: 556123-4567
    VAT: SE556123456701
    
    Danish ApS
    CVR: 12 34 56 78
    VAT: DK12345678
    
    This addendum is effective immediately.
    """
    
    config = {
        'provider': 'openai',
        'model': 'gpt-4o',
        'temperature': 0.0,
        'max_tokens': 4000
    }
    
    llm_client = LLMClient(config)
    
    print("\n" + "="*50)
    print("Testing additional European formats...")
    
    try:
        result = llm_client.analyze_contract(test_document, "european_addendum.pdf")
        
        extracted_tax_ids = result.get('third_party_tax_ids', [])
        print(f"Extracted additional tax IDs: {extracted_tax_ids}")
        
        # Check for various European formats
        expected_patterns = ['CHE-123.456.789', '123 456 789', '556123-4567', '12 34 56 78']
        found_patterns = 0
        
        for pattern in expected_patterns:
            if any(pattern in tax_id for tax_id in extracted_tax_ids):
                found_patterns += 1
                print(f"  ✓ Found: {pattern}")
            else:
                print(f"  ✗ Missing: {pattern}")
        
        print(f"\nAdditional formats found: {found_patterns}/{len(expected_patterns)}")
        return found_patterns >= len(expected_patterns) * 0.5  # 50% success for edge cases
        
    except Exception as e:
        print(f"Error in edge case testing: {e}")
        return False

Return Value

Returns a boolean value indicating test success. Returns True if at least 50% of the expected tax ID patterns are found in the extracted results (2 out of 4 patterns), False if the test fails or an exception occurs. The function uses a lenient 50% threshold specifically for edge case testing.

Dependencies

  • pathlib
  • json
  • os
  • sys

Required Imports

import os
import sys
import json
from pathlib import Path
from utils.llm_client import LLMClient

Usage Example

# Ensure OPENAI_API_KEY is set in environment
# Ensure utils.llm_client.LLMClient is available

from utils.llm_client import LLMClient

# Run the edge case test
test_passed = test_edge_cases()

if test_passed:
    print("Edge case test passed: LLM successfully extracted European tax IDs")
else:
    print("Edge case test failed: Check tax ID extraction logic")

# Expected output includes:
# - Extracted tax IDs list
# - Pattern matching results for each expected format
# - Success/failure indicators for each pattern
# - Overall pass/fail result

Best Practices

  • This function is designed for testing purposes and should be run in a test suite or validation context
  • Ensure OpenAI API credentials are properly configured before running
  • The function uses a 50% success threshold for edge cases, which may need adjustment based on requirements
  • Console output is printed directly; consider capturing output for automated testing
  • The test document is hardcoded; consider parameterizing for different test scenarios
  • Error handling catches all exceptions but only prints them; consider logging or re-raising for production use
  • The function assumes LLMClient.analyze_contract returns a dictionary with 'third_party_tax_ids' key
  • Temperature is set to 0.0 for deterministic results, which is appropriate for testing

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_international_tax_ids 77.8% similar

    A test function that validates an LLM client's ability to extract tax identification numbers and business registration numbers from a multi-party international contract document across 8 different countries.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_international_tax_ids.py
  • function test_new_fields 68.8% similar

    A test function that validates an LLM client's ability to extract third-party email addresses and tax identification numbers from contract documents.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_new_fields.py
  • function test_llm_extraction 60.7% similar

    A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
  • function test_single_document 59.9% similar

    Tests end date extraction from a specific PDF document by downloading it from FileCloud, extracting text, and using LLM-based analysis to identify contract expiry dates.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_single_document.py
  • function test_end_date_extraction 59.3% similar

    Tests end date extraction functionality for contract documents that previously had missing end dates by downloading documents from FileCloud, extracting text, analyzing with LLM, and comparing results.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_missing_end_dates.py
← Back to Browse