test_extraction_debugging - Code Extractor

function test_extraction_debugging

Maturity: 46

A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.

File:
/tf/active/vicechatdev/vice_ai/test_extraction_debug.py

Lines:
44 - 90

Complexity:
moderate

Purpose

This function serves as a unit test to ensure that the DocumentProcessor's debugging capabilities work correctly. It tests the creation of debug log files in JSON format, verifies the existence of the extracted directory, lists existing debug files, simulates document extraction with test data, and validates that debug logs are properly saved. This is useful for development and troubleshooting of document processing pipelines.

Source Code

def test_extraction_debugging():
    """Test the extraction debugging functionality"""
    print("🧪 Testing Extraction Debugging Functionality")
    print("=" * 50)
    
    # Initialize document processor
    processor = DocumentProcessor()
    
    # Check that extracted directory exists
    extracted_dir = Path(__file__).parent / "extracted"
    print(f"📁 Extracted directory: {extracted_dir}")
    print(f"📁 Directory exists: {extracted_dir.exists()}")
    
    # List any existing debug files
    if extracted_dir.exists():
        existing_files = list(extracted_dir.glob("*.json"))
        print(f"📄 Existing debug files: {len(existing_files)}")
        for file in existing_files[-3:]:  # Show last 3 files
            print(f"   - {file.name}")
    
    # Test with a simple text file (this won't use advanced processing)
    test_file = create_test_file()
    print(f"\n🔍 Testing with file: {test_file}")
    
    try:
        # Since we don't have PDF/Word processing for plain text,
        # let's simulate by calling the processor anyway
        result = {"text_chunks": ["Test chunk 1", "Test chunk 2"], "tables": []}
        processor._save_extraction_debug_log(test_file, result, "test_method")
        
        print("✅ Debug log creation test completed")
        
        # Check if new debug file was created
        new_files = list(extracted_dir.glob("*.json"))
        if new_files:
            latest_file = max(new_files, key=lambda f: f.stat().st_mtime)
            print(f"📄 Latest debug file: {latest_file.name}")
            print(f"📏 File size: {latest_file.stat().st_size} bytes")
        
    except Exception as e:
        print(f"❌ Error during test: {e}")
    finally:
        # Clean up test file
        try:
            Path(test_file).unlink()
        except:
            pass

Return Value

This function does not return any value (implicitly returns None). It performs side effects including printing test results to console, creating temporary test files, generating debug logs, and cleaning up test artifacts.

Dependencies

tempfile
logging
pathlib

Required Imports

import tempfile
import logging
from pathlib import Path
from document_processor import DocumentProcessor

Usage Example

# Ensure DocumentProcessor and create_test_file are available
from document_processor import DocumentProcessor
from pathlib import Path
import tempfile
import logging

# Define create_test_file if not already available
def create_test_file():
    with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
        f.write('Test content')
        return f.name

# Run the test
test_extraction_debugging()

# Expected output:
# 🧪 Testing Extraction Debugging Functionality
# ==================================================
# 📁 Extracted directory: /path/to/extracted
# 📁 Directory exists: True
# 📄 Existing debug files: 5
#    - debug_log_20231201_120000.json
# 🔍 Testing with file: /tmp/tmpxyz123.txt
# ✅ Debug log creation test completed
# 📄 Latest debug file: debug_log_20231201_120100.json
# 📏 File size: 256 bytes

Best Practices

This function should be run in a test environment, not in production code
Ensure the DocumentProcessor class has the _save_extraction_debug_log method implemented
The function includes cleanup logic in a finally block to remove test files, preventing test artifacts from accumulating
The function uses emoji indicators for visual clarity in test output, making it easy to identify test stages and results
Consider wrapping this in a proper test framework (pytest, unittest) for better integration with CI/CD pipelines
The function assumes write permissions in the current directory - ensure proper permissions before running
Debug files are created in an 'extracted' directory relative to the test file location
The test uses simulated data rather than actual document processing, making it suitable for testing the logging mechanism independently

Similar Components

AI-powered semantic similarity - components with related functionality:

function test_document_processing 76.8% similar

A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.
From: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py
function test_document_extractor 76.1% similar

A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.
From: /tf/active/vicechatdev/leexi/test_document_extractor.py
function test_document_processor 75.3% similar

A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.
From: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py
function test_multiple_files 71.8% similar

A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.
From: /tf/active/vicechatdev/leexi/test_multiple_files.py
function test_enhanced_pdf_processing 66.8% similar

A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.
From: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def test_extraction_debugging():
    """Test the extraction debugging functionality"""
    print("🧪 Testing Extraction Debugging Functionality")
    print("=" * 50)
    
    # Initialize document processor
    processor = DocumentProcessor()
    
    # Check that extracted directory exists
    extracted_dir = Path(__file__).parent / "extracted"
    print(f"📁 Extracted directory: {extracted_dir}")
    print(f"📁 Directory exists: {extracted_dir.exists()}")
    
    # List any existing debug files
    if extracted_dir.exists():
        existing_files = list(extracted_dir.glob("*.json"))
        print(f"📄 Existing debug files: {len(existing_files)}")
        for file in existing_files[-3:]:  # Show last 3 files
            print(f"   - {file.name}")
    
    # Test with a simple text file (this won't use advanced processing)
    test_file = create_test_file()
    print(f"\n🔍 Testing with file: {test_file}")
    
    try:
        # Since we don't have PDF/Word processing for plain text,
        # let's simulate by calling the processor anyway
        result = {"text_chunks": ["Test chunk 1", "Test chunk 2"], "tables": []}
        processor._save_extraction_debug_log(test_file, result, "test_method")
        
        print("✅ Debug log creation test completed")
        
        # Check if new debug file was created
        new_files = list(extracted_dir.glob("*.json"))
        if new_files:
            latest_file = max(new_files, key=lambda f: f.stat().st_mtime)
            print(f"📄 Latest debug file: {latest_file.name}")
            print(f"📏 File size: {latest_file.stat().st_size} bytes")
        
    except Exception as e:
        print(f"❌ Error during test: {e}")
    finally:
        # Clean up test file
        try:
            Path(test_file).unlink()
        except:
            pass
                        

Improved Code

🔍 Code Extractor

function test_extraction_debugging

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_document_processing 76.8% similar

function test_document_extractor 76.1% similar

function test_document_processor 75.3% similar

function test_multiple_files 71.8% similar

function test_enhanced_pdf_processing 66.8% similar

function test_extraction_debugging

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_document_processing 76.8% similar

function test_document_extractor 76.1% similar

function test_document_processor 75.3% similar

function test_multiple_files 71.8% similar

function test_enhanced_pdf_processing 66.8% similar

✨ Improve Code: test_extraction_debugging

Code Comparison