🔍 Code Extractor

function test_document_processing

Maturity: 42

A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.

File:
/tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py
Lines:
77 - 122
Complexity:
moderate

Purpose

This function serves as an integration test for document processing capabilities. It creates a temporary PDF file with sample contract content using reportlab, processes it through the DocumentProcessor class, and validates that either text extraction succeeds or errors are handled gracefully. It includes fallback logic for environments where reportlab is unavailable, testing error handling in those cases.

Source Code

def test_document_processing(config):
    """Test document processing."""
    print("\nTesting document processing...")
    
    try:
        processor = DocumentProcessor(config.get('document_processing', {}))
        
        # Create a simple test PDF using basic content
        import io
        try:
            from reportlab.lib.pagesizes import letter
            from reportlab.pdfgen import canvas
            
            # Create a simple PDF with contract content
            with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as f:
                c = canvas.Canvas(f.name, pagesize=letter)
                c.drawString(100, 750, "CONTRACT AGREEMENT")
                c.drawString(100, 700, "This is a test contract between Company A and Company B.")
                c.drawString(100, 680, "The contract is valid from January 1, 2024 to December 31, 2024.")
                c.drawString(100, 660, "This agreement shall remain in effect throughout the specified period.")
                c.save()
                test_file = f.name
        except ImportError:
            # Fallback: create a simple text file and test extraction
            with tempfile.NamedTemporaryFile(mode='w', suffix='.pdf', delete=False) as f:
                # This will fail extraction but test the error handling
                f.write("This is not a real PDF but tests the error handling")
                test_file = f.name
        
        try:
            result = processor.process_document(test_file, os.path.basename(test_file))
            if result and result.get('success'):
                text = result.get('text', '')
                print(f"✓ Document processing works")
                print(f"  Extracted {len(text)} characters")
                return True
            else:
                print(f"✓ Document processing handles errors correctly")
                print(f"  Error: {result.get('error', 'Unknown error') if result else 'No result'}")
                return True  # Error handling is also success
        finally:
            os.unlink(test_file)
            
    except Exception as e:
        print(f"✗ Document processing failed: {e}")
        return False

Parameters

Name Type Default Kind
config - - positional_or_keyword

Parameter Details

config: A configuration dictionary or object containing document processing settings. Expected to have a 'document_processing' key that returns a dictionary of settings used to initialize the DocumentProcessor. The exact structure depends on the DocumentProcessor class requirements.

Return Value

Returns a boolean value: True if document processing works correctly (either successful extraction or proper error handling), False if an exception occurs during the test execution. The function prints status messages to stdout indicating success (✓) or failure (✗) along with details about extracted text length or error messages.

Dependencies

  • os
  • sys
  • tempfile
  • logging
  • pathlib
  • io
  • reportlab

Required Imports

import os
import tempfile
from utils.document_processor import DocumentProcessor

Conditional/Optional Imports

These imports are only needed under specific conditions:

import io

Condition: Used when reportlab is available for PDF creation

Required (conditional)
from reportlab.lib.pagesizes import letter

Condition: Only if reportlab is installed; function falls back to text file if unavailable

Optional
from reportlab.pdfgen import canvas

Condition: Only if reportlab is installed; function falls back to text file if unavailable

Optional

Usage Example

# Example usage in a test suite
from config.config import Config

# Load configuration
config = Config()
config_dict = {
    'document_processing': {
        'max_file_size': 10485760,
        'supported_formats': ['.pdf', '.docx', '.txt']
    }
}

# Run the test
result = test_document_processing(config_dict)
if result:
    print("Document processing test passed")
else:
    print("Document processing test failed")

Best Practices

  • This function creates temporary files and ensures cleanup using try-finally blocks to prevent file leaks
  • The function includes graceful degradation when reportlab is not available, testing error handling instead
  • Always ensure the config parameter contains the required 'document_processing' key before calling
  • The function prints output directly to stdout, so it's designed for interactive testing rather than automated test suites
  • Consider using pytest fixtures or unittest setUp/tearDown methods for better integration with test frameworks
  • The function returns True even for handled errors, distinguishing between expected failures and unexpected exceptions

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_document_processor 85.4% similar

    A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py
  • function test_enhanced_pdf_processing 78.3% similar

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    From: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py
  • function test_extraction_debugging 76.8% similar

    A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.

    From: /tf/active/vicechatdev/vice_ai/test_extraction_debug.py
  • function test_local_document 73.3% similar

    Integration test function that validates end date extraction from a local PDF document using document processing and LLM-based analysis.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py
  • class TestDocumentProcessor 72.7% similar

    A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py
← Back to Browse