🔍 Code Extractor

function print_summary_v1

Maturity: 46

Prints a comprehensive summary report of document comparison results, including status breakdowns, signature analysis, match quality metrics, and examples from each category.

File:
/tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
Lines:
379 - 445
Complexity:
moderate

Purpose

This function generates a formatted console report analyzing document comparison results between two sets of documents (output and wuxi2). It calculates and displays statistics about document presence, signature status, match quality, and provides representative examples from each category. The function is designed for document verification workflows where documents need to be tracked across systems and their signature status validated.

Source Code

def print_summary(results: List[Dict]):
    """Print summary statistics"""
    print("\n" + "="*80)
    print("ENHANCED COMPARISON SUMMARY")
    print("="*80 + "\n")
    
    total = len(results)
    present_signed = sum(1 for r in results if r['status'] == 'PRESENT & SIGNED')
    present_unsigned = sum(1 for r in results if r['status'] == 'PRESENT BUT UNSIGNED')
    absent = sum(1 for r in results if r['status'] == 'ABSENT')
    
    print(f"Total coded documents in output:  {total}")
    print(f"\nDocument Status:")
    print(f"  Present & Signed:     {present_signed:3d} ({present_signed/total*100:.1f}%)")
    print(f"  Present but Unsigned: {present_unsigned:3d} ({present_unsigned/total*100:.1f}%)")
    print(f"  Absent from wuxi2:    {absent:3d} ({absent/total*100:.1f}%)")
    
    # Output signature analysis
    output_signed = sum(1 for r in results if r['output_signed'])
    print(f"\nOutput Documents (from email):")
    print(f"  Signed (DocuSign):    {output_signed:3d} ({output_signed/total*100:.1f}%)")
    print(f"  Unsigned:             {total-output_signed:3d} ({(total-output_signed)/total*100:.1f}%)")
    
    # Match quality for present documents
    present_results = [r for r in results if r['status'].startswith('PRESENT')]
    if present_results:
        print(f"\nMatch Quality (for present documents):")
        identical = sum(1 for r in present_results if r['match_type'] == 'IDENTICAL')
        size_match = sum(1 for r in present_results if r['match_type'] == 'SIZE_MATCH')
        high_sim = sum(1 for r in present_results if r['match_type'] == 'HIGH_SIMILARITY')
        fuzzy = sum(1 for r in present_results if r['match_type'] == 'FUZZY_MATCH')
        
        print(f"  Identical (hash):     {identical:3d} ({identical/len(present_results)*100:.1f}%)")
        print(f"  Size match:           {size_match:3d} ({size_match/len(present_results)*100:.1f}%)")
        print(f"  High similarity:      {high_sim:3d} ({high_sim/len(present_results)*100:.1f}%)")
        print(f"  Fuzzy match:          {fuzzy:3d} ({fuzzy/len(present_results)*100:.1f}%)")
    
    print("\n" + "="*80)
    
    # Examples of each category
    print("\nExamples of PRESENT & SIGNED documents:")
    print("-" * 80)
    for r in [r for r in results if r['status'] == 'PRESENT & SIGNED'][:5]:
        print(f"  {r['document_code']:15s} {r['wuxi2_filename'][:60]}")
        print(f"    Vicebio signers: {r['wuxi2_vicebio_signers']}")
        print(f"    Wuxi signers: {r['wuxi2_wuxi_signers']}")
    
    present_unsigned_docs = [r for r in results if r['status'] == 'PRESENT BUT UNSIGNED']
    if present_unsigned_docs:
        print(f"\nExamples of PRESENT BUT UNSIGNED documents:")
        print("-" * 80)
        for r in present_unsigned_docs[:5]:
            print(f"  {r['document_code']:15s} {r['wuxi2_filename'][:60]}")
            print(f"    Wuxi signers only: {r['wuxi2_wuxi_signers']}")
        if len(present_unsigned_docs) > 5:
            print(f"  ... and {len(present_unsigned_docs) - 5} more")
    
    absent_docs = [r for r in results if r['status'] == 'ABSENT']
    if absent_docs:
        print(f"\nExamples of ABSENT documents:")
        print("-" * 80)
        for r in absent_docs[:5]:
            print(f"  {r['document_code']:15s} {r['output_filename'][:60]}")
        if len(absent_docs) > 5:
            print(f"  ... and {len(absent_docs) - 5} more")
    
    print("\n" + "="*80)

Parameters

Name Type Default Kind
results List[Dict] - positional_or_keyword

Parameter Details

results: A list of dictionaries where each dictionary represents a document comparison result. Each dictionary must contain keys: 'status' (str: 'PRESENT & SIGNED', 'PRESENT BUT UNSIGNED', or 'ABSENT'), 'document_code' (str: document identifier), 'output_signed' (bool: whether output document is signed), 'match_type' (str: 'IDENTICAL', 'SIZE_MATCH', 'HIGH_SIMILARITY', or 'FUZZY_MATCH'), 'wuxi2_filename' (str: filename in wuxi2 system), 'output_filename' (str: filename in output system), 'wuxi2_vicebio_signers' (list/str: Vicebio signers), 'wuxi2_wuxi_signers' (list/str: Wuxi signers). The list should contain all documents to be analyzed in the summary report.

Return Value

This function returns None. It produces side effects by printing formatted text output directly to the console (stdout). The output includes statistical summaries, percentage calculations, and example documents from each category.

Required Imports

from typing import List
from typing import Dict

Usage Example

from typing import List, Dict

# Sample results data structure
results = [
    {
        'status': 'PRESENT & SIGNED',
        'document_code': 'DOC001',
        'output_signed': True,
        'match_type': 'IDENTICAL',
        'wuxi2_filename': 'contract_signed.pdf',
        'output_filename': 'contract_output.pdf',
        'wuxi2_vicebio_signers': ['John Doe', 'Jane Smith'],
        'wuxi2_wuxi_signers': ['Wang Li']
    },
    {
        'status': 'PRESENT BUT UNSIGNED',
        'document_code': 'DOC002',
        'output_signed': False,
        'match_type': 'SIZE_MATCH',
        'wuxi2_filename': 'agreement.pdf',
        'output_filename': 'agreement_output.pdf',
        'wuxi2_vicebio_signers': [],
        'wuxi2_wuxi_signers': ['Chen Wei']
    },
    {
        'status': 'ABSENT',
        'document_code': 'DOC003',
        'output_signed': True,
        'match_type': '',
        'wuxi2_filename': '',
        'output_filename': 'missing_doc.pdf',
        'wuxi2_vicebio_signers': [],
        'wuxi2_wuxi_signers': []
    }
]

# Print the summary report
print_summary(results)

Best Practices

  • Ensure all dictionaries in the results list contain all required keys to avoid KeyError exceptions
  • The function expects specific string values for 'status' and 'match_type' fields - use exact matches ('PRESENT & SIGNED', 'PRESENT BUT UNSIGNED', 'ABSENT', 'IDENTICAL', 'SIZE_MATCH', 'HIGH_SIMILARITY', 'FUZZY_MATCH')
  • The function performs division operations, so ensure the results list is not empty to avoid division by zero errors
  • Filenames are truncated to 60 characters in the output - consider this when reviewing examples
  • The function displays up to 5 examples per category; larger datasets will show '... and X more' messages
  • This function is designed for console output and may not be suitable for logging or file-based reporting without modification
  • The output uses fixed-width formatting (80 characters) which may not display correctly in narrow console windows

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function print_summary 87.3% similar

    Prints a formatted summary report of document comparison results, including presence status, match quality statistics, and examples of absent and modified documents.

    From: /tf/active/vicechatdev/mailsearch/compare_documents.py
  • function compare_documents_v1 69.3% similar

    Compares two sets of PDF documents by matching document codes, detecting signatures, calculating content similarity, and generating detailed comparison results with signature information.

    From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
  • function main_v57 66.6% similar

    Main execution function that orchestrates a document comparison workflow between two directories (mailsearch/output and wuxi2 repository), scanning for coded documents, comparing them, and generating results.

    From: /tf/active/vicechatdev/mailsearch/compare_documents.py
  • function main_v102 65.3% similar

    Main entry point function that orchestrates a document comparison workflow between two folders (mailsearch/output and wuxi2 repository), detecting signatures and generating comparison results.

    From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
  • function main_v94 61.7% similar

    Entry point function that compares real versus uploaded documents using DocumentComparator and displays the comparison results with formatted output.

    From: /tf/active/vicechatdev/e-ink-llm/cloudtest/compare_documents.py
← Back to Browse