šŸ” Code Extractor

function test_multiple_files

Maturity: 45

A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

File:
/tf/active/vicechatdev/leexi/test_multiple_files.py
Lines:
15 - 71
Complexity:
moderate

Purpose

This function serves as a test harness for the DocumentExtractor class, specifically testing its ability to extract text from multiple files (markdown and text formats). It verifies file existence, extracts content from each file individually, displays extraction statistics, and simulates how the extracted content would be combined for LLM processing. The function provides detailed console output for debugging and validation purposes.

Source Code

def test_multiple_files():
    """Test the previous reports extraction with multiple files"""
    
    # Initialize extractor
    extractor = DocumentExtractor()
    
    print("Multiple Files Previous Reports Test")
    print("=" * 50)
    
    # Test files
    test_files = [
        "test_files/previous_report_1.md",
        "test_files/previous_report_2.txt"
    ]
    
    # Check if test files exist
    for file_path in test_files:
        if not os.path.exists(file_path):
            print(f"Test file not found: {file_path}")
            return
    
    print(f"Testing with {len(test_files)} files:")
    for file_path in test_files:
        print(f"  - {file_path}")
    print()
    
    # Test extraction from each file
    extracted_contents = []
    for file_path in test_files:
        print(f"Extracting from: {file_path}")
        try:
            content = extractor.extract_text(file_path)
            if content:
                print(f"āœ“ Successfully extracted {len(content)} characters")
                extracted_contents.append(content)
                print(f"Preview: {content[:100]}...")
            else:
                print("āœ— No content extracted")
        except Exception as e:
            print(f"āœ— Error: {str(e)}")
        print("-" * 40)
    
    # Simulate the combined extraction process
    if extracted_contents:
        print("\nSimulating combined extraction for LLM:")
        combined_content = []
        for i, content in enumerate(extracted_contents):
            file_name = Path(test_files[i]).name
            combined_content.append(f"=== {file_name} ===\n{content}\n")
        
        full_content = "\n".join(combined_content)
        print(f"Total combined content: {len(full_content)} characters")
        print(f"Combined preview:\n{full_content[:500]}...")
        
        print("\nāœ“ Multiple file extraction simulation successful!")
    else:
        print("\nāœ— No content extracted from any files")

Return Value

This function does not return any value (implicitly returns None). It performs side effects by printing test results to the console and may exit early if test files are not found.

Dependencies

  • os
  • sys
  • pathlib
  • document_extractor

Required Imports

import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor

Usage Example

# Ensure test files exist in the correct location
# test_files/previous_report_1.md
# test_files/previous_report_2.txt

import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor

# Run the test
test_multiple_files()

# Expected output:
# Multiple Files Previous Reports Test
# ==================================================
# Testing with 2 files:
#   - test_files/previous_report_1.md
#   - test_files/previous_report_2.txt
# 
# Extracting from: test_files/previous_report_1.md
# āœ“ Successfully extracted X characters
# ...

Best Practices

  • Ensure test files exist before running the function to avoid early termination
  • The function expects specific file paths ('test_files/previous_report_1.md' and 'test_files/previous_report_2.txt') - modify the test_files list if using different paths
  • This is a test function and should not be used in production code; it's designed for development and validation purposes
  • The function prints directly to console - redirect stdout if you need to capture output programmatically
  • Error handling is basic (try-except with print) - consider enhancing for production use
  • The function returns early if files don't exist, which may not be ideal for comprehensive test suites
  • Consider using a proper testing framework (pytest, unittest) instead of standalone test functions for better integration and reporting

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_document_extractor 81.7% similar

    A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

    From: /tf/active/vicechatdev/leexi/test_document_extractor.py
  • function test_mixed_previous_reports 78.9% similar

    A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (TXT and Markdown) and combine them into a unified previous reports summary.

    From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
  • function test_extraction_debugging 71.8% similar

    A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.

    From: /tf/active/vicechatdev/vice_ai/test_extraction_debug.py
  • function test_document_processor 70.3% similar

    A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py
  • function test_llm_extraction 66.4% similar

    A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
← Back to Browse