function test_excel_output
A test function that validates Excel output generation for contract analysis results, ensuring new fields (third_party_emails, third_party_tax_ids) are properly formatted and persisted.
/tf/active/vicechatdev/contract_validity_analyzer/test_excel_output.py
13 - 114
moderate
Purpose
This function serves as a unit test to verify that contract analysis results can be correctly transformed into a pandas DataFrame and exported to Excel format. It specifically validates the presence and proper formatting of newly added fields (email addresses and tax IDs), tests the complete write/read cycle to Excel, and ensures data integrity is maintained throughout the process. This is critical for ensuring the contract analysis system's output meets requirements for downstream reporting and data storage.
Source Code
def test_excel_output():
"""Test that the Excel output includes new fields."""
# Sample analysis results with new fields
sample_results = [
{
'filename': 'test_contract_1.pdf',
'contract_type': 'NDA',
'third_parties': ['TechCorp Solutions Inc.'],
'third_party_emails': ['contracts@techcorp.com', 'jane.smith@techcorp.com'],
'third_party_tax_ids': ['12-3456789'],
'start_date': '2024-01-15',
'end_date': '2027-01-15',
'is_in_effect': True,
'confidence': 0.95,
'analysis_notes': 'Test contract for validation',
'filecloud_path': '/test/path',
'filecloud_url': 'https://test.com',
'file_size_bytes': 1048576,
'analysis_timestamp': '2025-07-02T12:00:00'
}
]
# Format the data as the analyzer would
data_rows = []
for result in sample_results:
row = {
'filename': result.get('filename', ''),
'contract_type': result.get('contract_type', 'Unknown'),
'third_parties': '; '.join(result.get('third_parties', [])),
'third_party_emails': '; '.join(result.get('third_party_emails', [])),
'third_party_tax_ids': '; '.join(result.get('third_party_tax_ids', [])),
'start_date': result.get('start_date', ''),
'end_date': result.get('end_date', ''),
'is_in_effect': result.get('is_in_effect', False),
'confidence': result.get('confidence', 0.0),
'analysis_notes': result.get('analysis_notes', ''),
'filecloud_path': result.get('filecloud_path', ''),
'filecloud_url': result.get('filecloud_url', ''),
'file_size_mb': round(result.get('file_size_bytes', 0) / (1024 * 1024), 2),
'analysis_timestamp': result.get('analysis_timestamp', '')
}
data_rows.append(row)
# Create DataFrame
df = pd.DataFrame(data_rows)
print("DataFrame columns:")
for col in df.columns:
print(f" - {col}")
print("\nSample data:")
print(df.to_string(index=False))
# Check that new fields are present
required_new_fields = ['third_party_emails', 'third_party_tax_ids']
missing_fields = [field for field in required_new_fields if field not in df.columns]
if missing_fields:
print(f"\nā Missing required fields: {missing_fields}")
return False
else:
print(f"\nā All required new fields present: {required_new_fields}")
# Check that the data is properly formatted
emails_data = df['third_party_emails'].iloc[0]
tax_ids_data = df['third_party_tax_ids'].iloc[0]
if 'contracts@techcorp.com' in emails_data and 'jane.smith@techcorp.com' in emails_data:
print("ā Email addresses properly formatted")
else:
print("ā Email addresses not properly formatted")
return False
if '12-3456789' in tax_ids_data:
print("ā Tax IDs properly formatted")
else:
print("ā Tax IDs not properly formatted")
return False
# Test writing to Excel
test_file = 'test_output.xlsx'
try:
df.to_excel(test_file, index=False)
print(f"ā Successfully wrote test Excel file: {test_file}")
# Read it back to verify
df_read = pd.read_excel(test_file)
if 'third_party_emails' in df_read.columns and 'third_party_tax_ids' in df_read.columns:
print("ā Excel file contains new fields after read/write")
else:
print("ā Excel file missing new fields after read/write")
return False
# Clean up
os.remove(test_file)
except Exception as e:
print(f"ā Error writing Excel file: {e}")
return False
return True
Return Value
Returns a boolean value: True if all validation checks pass (required fields present, data properly formatted, Excel write/read successful), False if any validation check fails. The function also prints detailed status messages to stdout indicating which checks passed or failed.
Dependencies
pandasopenpyxl
Required Imports
import os
import pandas as pd
Usage Example
# Run the test function
result = test_excel_output()
if result:
print('All Excel output tests passed')
else:
print('Excel output tests failed')
# Expected output:
# DataFrame columns:
# - filename
# - contract_type
# - third_parties
# - third_party_emails
# - third_party_tax_ids
# - start_date
# - end_date
# - is_in_effect
# - confidence
# - analysis_notes
# - filecloud_path
# - filecloud_url
# - file_size_mb
# - analysis_timestamp
# ...
# ā All required new fields present: ['third_party_emails', 'third_party_tax_ids']
# ā Email addresses properly formatted
# ā Tax IDs properly formatted
# ā Successfully wrote test Excel file: test_output.xlsx
# ā Excel file contains new fields after read/write
Best Practices
- This function creates a temporary test_output.xlsx file and cleans it up after testing, but may leave the file if an exception occurs before cleanup
- The function uses hardcoded sample data for testing; ensure this data structure matches the actual contract analysis output format
- File size conversion from bytes to MB uses 1024-based calculation (binary) rather than 1000-based (decimal)
- Multiple list fields (third_parties, emails, tax_ids) are joined with '; ' separator for Excel compatibility
- The function prints verbose output to stdout, making it suitable for manual testing but may need modification for automated test suites
- Requires openpyxl to be installed as pandas' Excel engine (install via: pip install openpyxl)
- Consider wrapping in try-finally block to ensure test file cleanup even on failure
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_new_fields 62.8% similar
-
function test_llm_extraction 59.2% similar
-
function test_enhanced_workflow 58.7% similar
-
function test_european_csv 57.9% similar
-
function validate_sheet_format 57.1% similar