šŸ” Code Extractor

function test_excel_output

Maturity: 44

A test function that validates Excel output generation for contract analysis results, ensuring new fields (third_party_emails, third_party_tax_ids) are properly formatted and persisted.

File:
/tf/active/vicechatdev/contract_validity_analyzer/test_excel_output.py
Lines:
13 - 114
Complexity:
moderate

Purpose

This function serves as a unit test to verify that contract analysis results can be correctly transformed into a pandas DataFrame and exported to Excel format. It specifically validates the presence and proper formatting of newly added fields (email addresses and tax IDs), tests the complete write/read cycle to Excel, and ensures data integrity is maintained throughout the process. This is critical for ensuring the contract analysis system's output meets requirements for downstream reporting and data storage.

Source Code

def test_excel_output():
    """Test that the Excel output includes new fields."""
    
    # Sample analysis results with new fields
    sample_results = [
        {
            'filename': 'test_contract_1.pdf',
            'contract_type': 'NDA',
            'third_parties': ['TechCorp Solutions Inc.'],
            'third_party_emails': ['contracts@techcorp.com', 'jane.smith@techcorp.com'],
            'third_party_tax_ids': ['12-3456789'],
            'start_date': '2024-01-15',
            'end_date': '2027-01-15',
            'is_in_effect': True,
            'confidence': 0.95,
            'analysis_notes': 'Test contract for validation',
            'filecloud_path': '/test/path',
            'filecloud_url': 'https://test.com',
            'file_size_bytes': 1048576,
            'analysis_timestamp': '2025-07-02T12:00:00'
        }
    ]
    
    # Format the data as the analyzer would
    data_rows = []
    for result in sample_results:
        row = {
            'filename': result.get('filename', ''),
            'contract_type': result.get('contract_type', 'Unknown'),
            'third_parties': '; '.join(result.get('third_parties', [])),
            'third_party_emails': '; '.join(result.get('third_party_emails', [])),
            'third_party_tax_ids': '; '.join(result.get('third_party_tax_ids', [])),
            'start_date': result.get('start_date', ''),
            'end_date': result.get('end_date', ''),
            'is_in_effect': result.get('is_in_effect', False),
            'confidence': result.get('confidence', 0.0),
            'analysis_notes': result.get('analysis_notes', ''),
            'filecloud_path': result.get('filecloud_path', ''),
            'filecloud_url': result.get('filecloud_url', ''),
            'file_size_mb': round(result.get('file_size_bytes', 0) / (1024 * 1024), 2),
            'analysis_timestamp': result.get('analysis_timestamp', '')
        }
        data_rows.append(row)
    
    # Create DataFrame
    df = pd.DataFrame(data_rows)
    
    print("DataFrame columns:")
    for col in df.columns:
        print(f"  - {col}")
    
    print("\nSample data:")
    print(df.to_string(index=False))
    
    # Check that new fields are present
    required_new_fields = ['third_party_emails', 'third_party_tax_ids']
    missing_fields = [field for field in required_new_fields if field not in df.columns]
    
    if missing_fields:
        print(f"\nāœ— Missing required fields: {missing_fields}")
        return False
    else:
        print(f"\nāœ“ All required new fields present: {required_new_fields}")
    
    # Check that the data is properly formatted
    emails_data = df['third_party_emails'].iloc[0]
    tax_ids_data = df['third_party_tax_ids'].iloc[0]
    
    if 'contracts@techcorp.com' in emails_data and 'jane.smith@techcorp.com' in emails_data:
        print("āœ“ Email addresses properly formatted")
    else:
        print("āœ— Email addresses not properly formatted")
        return False
    
    if '12-3456789' in tax_ids_data:
        print("āœ“ Tax IDs properly formatted")
    else:
        print("āœ— Tax IDs not properly formatted")
        return False
    
    # Test writing to Excel
    test_file = 'test_output.xlsx'
    try:
        df.to_excel(test_file, index=False)
        print(f"āœ“ Successfully wrote test Excel file: {test_file}")
        
        # Read it back to verify
        df_read = pd.read_excel(test_file)
        if 'third_party_emails' in df_read.columns and 'third_party_tax_ids' in df_read.columns:
            print("āœ“ Excel file contains new fields after read/write")
        else:
            print("āœ— Excel file missing new fields after read/write")
            return False
            
        # Clean up
        os.remove(test_file)
        
    except Exception as e:
        print(f"āœ— Error writing Excel file: {e}")
        return False
    
    return True

Return Value

Returns a boolean value: True if all validation checks pass (required fields present, data properly formatted, Excel write/read successful), False if any validation check fails. The function also prints detailed status messages to stdout indicating which checks passed or failed.

Dependencies

  • pandas
  • openpyxl

Required Imports

import os
import pandas as pd

Usage Example

# Run the test function
result = test_excel_output()
if result:
    print('All Excel output tests passed')
else:
    print('Excel output tests failed')

# Expected output:
# DataFrame columns:
#   - filename
#   - contract_type
#   - third_parties
#   - third_party_emails
#   - third_party_tax_ids
#   - start_date
#   - end_date
#   - is_in_effect
#   - confidence
#   - analysis_notes
#   - filecloud_path
#   - filecloud_url
#   - file_size_mb
#   - analysis_timestamp
# ...
# āœ“ All required new fields present: ['third_party_emails', 'third_party_tax_ids']
# āœ“ Email addresses properly formatted
# āœ“ Tax IDs properly formatted
# āœ“ Successfully wrote test Excel file: test_output.xlsx
# āœ“ Excel file contains new fields after read/write

Best Practices

  • This function creates a temporary test_output.xlsx file and cleans it up after testing, but may leave the file if an exception occurs before cleanup
  • The function uses hardcoded sample data for testing; ensure this data structure matches the actual contract analysis output format
  • File size conversion from bytes to MB uses 1024-based calculation (binary) rather than 1000-based (decimal)
  • Multiple list fields (third_parties, emails, tax_ids) are joined with '; ' separator for Excel compatibility
  • The function prints verbose output to stdout, making it suitable for manual testing but may need modification for automated test suites
  • Requires openpyxl to be installed as pandas' Excel engine (install via: pip install openpyxl)
  • Consider wrapping in try-finally block to ensure test file cleanup even on failure

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_new_fields 62.8% similar

    A test function that validates an LLM client's ability to extract third-party email addresses and tax identification numbers from contract documents.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_new_fields.py
  • function test_llm_extraction 59.2% similar

    A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
  • function test_enhanced_workflow 58.7% similar

    A comprehensive test function that validates the EnhancedSQLWorkflow system by testing component initialization, request parsing, and data assessment capabilities.

    From: /tf/active/vicechatdev/full_smartstat/test_enhanced_workflow.py
  • function test_european_csv 57.9% similar

    A test function that validates the ability to read and parse European-formatted CSV files (semicolon delimiters, comma decimal separators) and convert them to proper numeric types.

    From: /tf/active/vicechatdev/vice_ai/test_regional_formats.py
  • function validate_sheet_format 57.1% similar

    Analyzes Excel sheet structure using multiple heuristics to classify it as tabular data, information sheet, or mixed format, returning quality metrics and extraction recommendations.

    From: /tf/active/vicechatdev/vice_ai/smartstat_service.py
← Back to Browse