๐Ÿ” Code Extractor

function test_extraction_debugging

Maturity: 46

A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.

File:
/tf/active/vicechatdev/vice_ai/test_extraction_debug.py
Lines:
44 - 90
Complexity:
moderate

Purpose

This function serves as a unit test to ensure that the DocumentProcessor's debugging capabilities work correctly. It tests the creation of debug log files in JSON format, verifies the existence of the extracted directory, lists existing debug files, simulates document extraction with test data, and validates that debug logs are properly saved. This is useful for development and troubleshooting of document processing pipelines.

Source Code

def test_extraction_debugging():
    """Test the extraction debugging functionality"""
    print("๐Ÿงช Testing Extraction Debugging Functionality")
    print("=" * 50)
    
    # Initialize document processor
    processor = DocumentProcessor()
    
    # Check that extracted directory exists
    extracted_dir = Path(__file__).parent / "extracted"
    print(f"๐Ÿ“ Extracted directory: {extracted_dir}")
    print(f"๐Ÿ“ Directory exists: {extracted_dir.exists()}")
    
    # List any existing debug files
    if extracted_dir.exists():
        existing_files = list(extracted_dir.glob("*.json"))
        print(f"๐Ÿ“„ Existing debug files: {len(existing_files)}")
        for file in existing_files[-3:]:  # Show last 3 files
            print(f"   - {file.name}")
    
    # Test with a simple text file (this won't use advanced processing)
    test_file = create_test_file()
    print(f"\n๐Ÿ” Testing with file: {test_file}")
    
    try:
        # Since we don't have PDF/Word processing for plain text,
        # let's simulate by calling the processor anyway
        result = {"text_chunks": ["Test chunk 1", "Test chunk 2"], "tables": []}
        processor._save_extraction_debug_log(test_file, result, "test_method")
        
        print("โœ… Debug log creation test completed")
        
        # Check if new debug file was created
        new_files = list(extracted_dir.glob("*.json"))
        if new_files:
            latest_file = max(new_files, key=lambda f: f.stat().st_mtime)
            print(f"๐Ÿ“„ Latest debug file: {latest_file.name}")
            print(f"๐Ÿ“ File size: {latest_file.stat().st_size} bytes")
        
    except Exception as e:
        print(f"โŒ Error during test: {e}")
    finally:
        # Clean up test file
        try:
            Path(test_file).unlink()
        except:
            pass

Return Value

This function does not return any value (implicitly returns None). It performs side effects including printing test results to console, creating temporary test files, generating debug logs, and cleaning up test artifacts.

Dependencies

  • tempfile
  • logging
  • pathlib

Required Imports

import tempfile
import logging
from pathlib import Path
from document_processor import DocumentProcessor

Usage Example

# Ensure DocumentProcessor and create_test_file are available
from document_processor import DocumentProcessor
from pathlib import Path
import tempfile
import logging

# Define create_test_file if not already available
def create_test_file():
    with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
        f.write('Test content')
        return f.name

# Run the test
test_extraction_debugging()

# Expected output:
# ๐Ÿงช Testing Extraction Debugging Functionality
# ==================================================
# ๐Ÿ“ Extracted directory: /path/to/extracted
# ๐Ÿ“ Directory exists: True
# ๐Ÿ“„ Existing debug files: 5
#    - debug_log_20231201_120000.json
# ๐Ÿ” Testing with file: /tmp/tmpxyz123.txt
# โœ… Debug log creation test completed
# ๐Ÿ“„ Latest debug file: debug_log_20231201_120100.json
# ๐Ÿ“ File size: 256 bytes

Best Practices

  • This function should be run in a test environment, not in production code
  • Ensure the DocumentProcessor class has the _save_extraction_debug_log method implemented
  • The function includes cleanup logic in a finally block to remove test files, preventing test artifacts from accumulating
  • The function uses emoji indicators for visual clarity in test output, making it easy to identify test stages and results
  • Consider wrapping this in a proper test framework (pytest, unittest) for better integration with CI/CD pipelines
  • The function assumes write permissions in the current directory - ensure proper permissions before running
  • Debug files are created in an 'extracted' directory relative to the test file location
  • The test uses simulated data rather than actual document processing, making it suitable for testing the logging mechanism independently

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_document_processing 76.8% similar

    A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py
  • function test_document_extractor 76.1% similar

    A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

    From: /tf/active/vicechatdev/leexi/test_document_extractor.py
  • function test_document_processor 75.3% similar

    A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py
  • function test_multiple_files 71.8% similar

    A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

    From: /tf/active/vicechatdev/leexi/test_multiple_files.py
  • function test_enhanced_pdf_processing 66.8% similar

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    From: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py
โ† Back to Browse