🔍 Code Extractor

function detect_signatures_in_pdf

Maturity: 54

Detects Vicebio DocuSign signatures in PDF files to determine document approval status, distinguishing between Vicebio staff signatures (DocuSign) and Wuxi staff signatures (Chinese E-Sign).

File:
/tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
Lines:
50 - 127
Complexity:
simple

Purpose

This function analyzes PDF files to identify signature types and signers for business workflow validation. It specifically checks for DocuSign markers indicating Vicebio approval (from Jean Smal or Koen Huygens) versus Chinese E-Sign signatures from Wuxi staff. The function returns a structured result indicating whether the document has been approved by Vicebio, with confidence levels based on the signature markers found. This is used in document processing workflows to route or validate signed contracts and agreements.

Source Code

def detect_signatures_in_pdf(pdf_path):
    """
    Detect if a PDF contains Vicebio signatures (DocuSign).
    
    Business logic:
    - SIGNED = Has DocuSign signatures from Vicebio staff (Jean Smal, Koen Huygens)
    - UNSIGNED = Only has Chinese E-Sign signatures from Wuxi staff, or no signatures
    
    Returns dict with:
    - has_signature: bool (True only if DocuSign present - indicates Vicebio approval)
    - confidence: 'HIGH', 'MEDIUM', 'NONE'
    - details: dict with specific markers found
    """
    signature_info = {
        'has_docusign': False,
        'has_esign': False,
        'has_digital_cert': False,
        'vicebio_signers': [],
        'wuxi_signers': [],
        'has_signature_text': False
    }
    
    try:
        with open(pdf_path, 'rb') as file:
            # Read raw PDF content
            raw_content = file.read()
            
            # DocuSign presence = Vicebio signatures present (fully signed)
            if b'DocuSign' in raw_content:
                signature_info['has_docusign'] = True
            
            # E-Sign = Chinese signing system (Wuxi signatures only)
            if b'E-Sign' in raw_content:
                signature_info['has_esign'] = True
            
            # Check for Vicebio staff signatures
            if b'Jean Smal' in raw_content:
                signature_info['vicebio_signers'].append('Jean Smal')
            
            if b'Koen Huygens' in raw_content:
                signature_info['vicebio_signers'].append('Koen Huygens')
            
            # Check for Wuxi staff signatures
            if b'zhou_zhijie' in raw_content:
                signature_info['wuxi_signers'].append('zhou_zhijie')
            
            # Digital certificate presence (both systems use this)
            if b'/SubFilter/adbe.pkcs7.detached' in raw_content or b'/SubFilter/ETSI.CAdES.detached' in raw_content:
                signature_info['has_digital_cert'] = True
            
            # "Digitally signed by" text
            if b'Digitally signed by' in raw_content:
                signature_info['has_signature_text'] = True
            
            # Determine if document has VICEBIO signatures (fully signed)
            # Only DocuSign indicates Vicebio approval
            if signature_info['has_docusign']:
                confidence = 'HIGH'
                has_signature = True
            elif signature_info['vicebio_signers']:
                # Has Vicebio signer names but no DocuSign marker
                confidence = 'MEDIUM'
                has_signature = True
            else:
                # Only has Chinese E-Sign or no signatures
                confidence = 'NONE'
                has_signature = False
                
    except Exception as e:
        print(f"Error detecting signatures in {pdf_path}: {e}")
        confidence = 'ERROR'
        has_signature = False
    
    return {
        'has_signature': has_signature,
        'confidence': confidence,
        'details': signature_info
    }

Parameters

Name Type Default Kind
pdf_path - - positional_or_keyword

Parameter Details

pdf_path: String or Path object representing the file system path to the PDF file to be analyzed. Must be a valid path to an existing PDF file that the function has read permissions for. Can be absolute or relative path.

Return Value

Returns a dictionary with three keys: 'has_signature' (bool, True only if DocuSign markers are present indicating Vicebio approval), 'confidence' (string, one of 'HIGH', 'MEDIUM', 'NONE', or 'ERROR'), and 'details' (dict containing granular signature information including 'has_docusign', 'has_esign', 'has_digital_cert', 'vicebio_signers' list, 'wuxi_signers' list, and 'has_signature_text' boolean). HIGH confidence means DocuSign present, MEDIUM means Vicebio signer names found without DocuSign marker, NONE means only E-Sign or no signatures, ERROR indicates an exception occurred.

Usage Example

# Analyze a PDF for Vicebio signatures
result = detect_signatures_in_pdf('/path/to/contract.pdf')

if result['has_signature']:
    print(f"Document is signed by Vicebio (confidence: {result['confidence']})")
    print(f"Vicebio signers: {result['details']['vicebio_signers']}")
    if result['details']['has_docusign']:
        print("DocuSign signature detected")
else:
    print(f"No Vicebio signature (confidence: {result['confidence']})")
    if result['details']['has_esign']:
        print("Only Chinese E-Sign signatures present")
    if result['details']['wuxi_signers']:
        print(f"Wuxi signers: {result['details']['wuxi_signers']}")

Best Practices

  • Always check the 'confidence' level in addition to 'has_signature' to understand the reliability of the detection
  • Handle the 'ERROR' confidence case appropriately in production code
  • The function reads the entire PDF into memory, so be cautious with very large files
  • This function performs byte-level pattern matching and does not validate cryptographic signatures
  • The function is specific to Vicebio/Wuxi business logic and hardcodes specific signer names
  • Consider wrapping calls in try-except blocks even though the function has internal error handling
  • The function only detects signature presence, not signature validity or authenticity
  • Binary pattern matching may produce false positives if the searched strings appear in PDF content rather than signature metadata

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function compare_documents_v1 63.5% similar

    Compares two sets of PDF documents by matching document codes, detecting signatures, calculating content similarity, and generating detailed comparison results with signature information.

    From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
  • function main_v102 51.9% similar

    Main entry point function that orchestrates a document comparison workflow between two folders (mailsearch/output and wuxi2 repository), detecting signatures and generating comparison results.

    From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
  • function process_document_signing 48.9% similar

    Initiates a document signing workflow through FileCloud's Signority integration, creating a signing request for specified signers and sending notification emails.

    From: /tf/active/vicechatdev/CDocs single class/controllers/filecloud_controller.py
  • function scan_wuxi2_folder_v1 48.8% similar

    Recursively scans a directory for PDF files, extracts document codes from filenames, and returns a dictionary mapping each unique document code to a list of file metadata dictionaries.

    From: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py
  • function process_document_signing_v1 48.6% similar

    Initiates a document signing workflow through FileCloud's Signority integration, creating a signing request for specified signers and sending notification emails.

    From: /tf/active/vicechatdev/CDocs/controllers/filecloud_controller.py
← Back to Browse