function test_extraction_debugging
A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.
/tf/active/vicechatdev/vice_ai/test_extraction_debug.py
44 - 90
moderate
Purpose
This function serves as a unit test to ensure that the DocumentProcessor's debugging capabilities work correctly. It tests the creation of debug log files in JSON format, verifies the existence of the extracted directory, lists existing debug files, simulates document extraction with test data, and validates that debug logs are properly saved. This is useful for development and troubleshooting of document processing pipelines.
Source Code
def test_extraction_debugging():
"""Test the extraction debugging functionality"""
print("๐งช Testing Extraction Debugging Functionality")
print("=" * 50)
# Initialize document processor
processor = DocumentProcessor()
# Check that extracted directory exists
extracted_dir = Path(__file__).parent / "extracted"
print(f"๐ Extracted directory: {extracted_dir}")
print(f"๐ Directory exists: {extracted_dir.exists()}")
# List any existing debug files
if extracted_dir.exists():
existing_files = list(extracted_dir.glob("*.json"))
print(f"๐ Existing debug files: {len(existing_files)}")
for file in existing_files[-3:]: # Show last 3 files
print(f" - {file.name}")
# Test with a simple text file (this won't use advanced processing)
test_file = create_test_file()
print(f"\n๐ Testing with file: {test_file}")
try:
# Since we don't have PDF/Word processing for plain text,
# let's simulate by calling the processor anyway
result = {"text_chunks": ["Test chunk 1", "Test chunk 2"], "tables": []}
processor._save_extraction_debug_log(test_file, result, "test_method")
print("โ
Debug log creation test completed")
# Check if new debug file was created
new_files = list(extracted_dir.glob("*.json"))
if new_files:
latest_file = max(new_files, key=lambda f: f.stat().st_mtime)
print(f"๐ Latest debug file: {latest_file.name}")
print(f"๐ File size: {latest_file.stat().st_size} bytes")
except Exception as e:
print(f"โ Error during test: {e}")
finally:
# Clean up test file
try:
Path(test_file).unlink()
except:
pass
Return Value
This function does not return any value (implicitly returns None). It performs side effects including printing test results to console, creating temporary test files, generating debug logs, and cleaning up test artifacts.
Dependencies
tempfileloggingpathlib
Required Imports
import tempfile
import logging
from pathlib import Path
from document_processor import DocumentProcessor
Usage Example
# Ensure DocumentProcessor and create_test_file are available
from document_processor import DocumentProcessor
from pathlib import Path
import tempfile
import logging
# Define create_test_file if not already available
def create_test_file():
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
f.write('Test content')
return f.name
# Run the test
test_extraction_debugging()
# Expected output:
# ๐งช Testing Extraction Debugging Functionality
# ==================================================
# ๐ Extracted directory: /path/to/extracted
# ๐ Directory exists: True
# ๐ Existing debug files: 5
# - debug_log_20231201_120000.json
# ๐ Testing with file: /tmp/tmpxyz123.txt
# โ
Debug log creation test completed
# ๐ Latest debug file: debug_log_20231201_120100.json
# ๐ File size: 256 bytes
Best Practices
- This function should be run in a test environment, not in production code
- Ensure the DocumentProcessor class has the _save_extraction_debug_log method implemented
- The function includes cleanup logic in a finally block to remove test files, preventing test artifacts from accumulating
- The function uses emoji indicators for visual clarity in test output, making it easy to identify test stages and results
- Consider wrapping this in a proper test framework (pytest, unittest) for better integration with CI/CD pipelines
- The function assumes write permissions in the current directory - ensure proper permissions before running
- Debug files are created in an 'extracted' directory relative to the test file location
- The test uses simulated data rather than actual document processing, making it suitable for testing the logging mechanism independently
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_document_processing 76.8% similar
-
function test_document_extractor 76.1% similar
-
function test_document_processor 75.3% similar
-
function test_multiple_files 71.8% similar
-
function test_enhanced_pdf_processing 66.8% similar