test_multiple_files - Code Extractor

function test_multiple_files

Maturity: 45

A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

File:
/tf/active/vicechatdev/leexi/test_multiple_files.py

Lines:
15 - 71

Complexity:
moderate

Purpose

This function serves as a test harness for the DocumentExtractor class, specifically testing its ability to extract text from multiple files (markdown and text formats). It verifies file existence, extracts content from each file individually, displays extraction statistics, and simulates how the extracted content would be combined for LLM processing. The function provides detailed console output for debugging and validation purposes.

Source Code

def test_multiple_files():
    """Test the previous reports extraction with multiple files"""
    
    # Initialize extractor
    extractor = DocumentExtractor()
    
    print("Multiple Files Previous Reports Test")
    print("=" * 50)
    
    # Test files
    test_files = [
        "test_files/previous_report_1.md",
        "test_files/previous_report_2.txt"
    ]
    
    # Check if test files exist
    for file_path in test_files:
        if not os.path.exists(file_path):
            print(f"Test file not found: {file_path}")
            return
    
    print(f"Testing with {len(test_files)} files:")
    for file_path in test_files:
        print(f"  - {file_path}")
    print()
    
    # Test extraction from each file
    extracted_contents = []
    for file_path in test_files:
        print(f"Extracting from: {file_path}")
        try:
            content = extractor.extract_text(file_path)
            if content:
                print(f"✓ Successfully extracted {len(content)} characters")
                extracted_contents.append(content)
                print(f"Preview: {content[:100]}...")
            else:
                print("✗ No content extracted")
        except Exception as e:
            print(f"✗ Error: {str(e)}")
        print("-" * 40)
    
    # Simulate the combined extraction process
    if extracted_contents:
        print("\nSimulating combined extraction for LLM:")
        combined_content = []
        for i, content in enumerate(extracted_contents):
            file_name = Path(test_files[i]).name
            combined_content.append(f"=== {file_name} ===\n{content}\n")
        
        full_content = "\n".join(combined_content)
        print(f"Total combined content: {len(full_content)} characters")
        print(f"Combined preview:\n{full_content[:500]}...")
        
        print("\n✓ Multiple file extraction simulation successful!")
    else:
        print("\n✗ No content extracted from any files")

Return Value

This function does not return any value (implicitly returns None). It performs side effects by printing test results to the console and may exit early if test files are not found.

Dependencies

os
sys
pathlib
document_extractor

Required Imports

import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor

Usage Example

# Ensure test files exist in the correct location
# test_files/previous_report_1.md
# test_files/previous_report_2.txt

import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor

# Run the test
test_multiple_files()

# Expected output:
# Multiple Files Previous Reports Test
# ==================================================
# Testing with 2 files:
#   - test_files/previous_report_1.md
#   - test_files/previous_report_2.txt
# 
# Extracting from: test_files/previous_report_1.md
# ✓ Successfully extracted X characters
# ...

Best Practices

Ensure test files exist before running the function to avoid early termination
The function expects specific file paths ('test_files/previous_report_1.md' and 'test_files/previous_report_2.txt') - modify the test_files list if using different paths
This is a test function and should not be used in production code; it's designed for development and validation purposes
The function prints directly to console - redirect stdout if you need to capture output programmatically
Error handling is basic (try-except with print) - consider enhancing for production use
The function returns early if files don't exist, which may not be ideal for comprehensive test suites
Consider using a proper testing framework (pytest, unittest) instead of standalone test functions for better integration and reporting

Similar Components

AI-powered semantic similarity - components with related functionality:

function test_document_extractor 81.7% similar

A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.
From: /tf/active/vicechatdev/leexi/test_document_extractor.py
function test_mixed_previous_reports 78.9% similar

A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (TXT and Markdown) and combine them into a unified previous reports summary.
From: /tf/active/vicechatdev/leexi/test_enhanced_reports.py
function test_extraction_debugging 71.8% similar

A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.
From: /tf/active/vicechatdev/vice_ai/test_extraction_debug.py
function test_document_processor 70.3% similar

A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.
From: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py
function test_llm_extraction 66.4% similar

A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.
From: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def test_multiple_files():
    """Test the previous reports extraction with multiple files"""
    
    # Initialize extractor
    extractor = DocumentExtractor()
    
    print("Multiple Files Previous Reports Test")
    print("=" * 50)
    
    # Test files
    test_files = [
        "test_files/previous_report_1.md",
        "test_files/previous_report_2.txt"
    ]
    
    # Check if test files exist
    for file_path in test_files:
        if not os.path.exists(file_path):
            print(f"Test file not found: {file_path}")
            return
    
    print(f"Testing with {len(test_files)} files:")
    for file_path in test_files:
        print(f"  - {file_path}")
    print()
    
    # Test extraction from each file
    extracted_contents = []
    for file_path in test_files:
        print(f"Extracting from: {file_path}")
        try:
            content = extractor.extract_text(file_path)
            if content:
                print(f"✓ Successfully extracted {len(content)} characters")
                extracted_contents.append(content)
                print(f"Preview: {content[:100]}...")
            else:
                print("✗ No content extracted")
        except Exception as e:
            print(f"✗ Error: {str(e)}")
        print("-" * 40)
    
    # Simulate the combined extraction process
    if extracted_contents:
        print("\nSimulating combined extraction for LLM:")
        combined_content = []
        for i, content in enumerate(extracted_contents):
            file_name = Path(test_files[i]).name
            combined_content.append(f"=== {file_name} ===\n{content}\n")
        
        full_content = "\n".join(combined_content)
        print(f"Total combined content: {len(full_content)} characters")
        print(f"Combined preview:\n{full_content[:500]}...")
        
        print("\n✓ Multiple file extraction simulation successful!")
    else:
        print("\n✗ No content extracted from any files")
                        

Improved Code

🔍 Code Extractor

function test_multiple_files

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_document_extractor 81.7% similar

function test_mixed_previous_reports 78.9% similar

function test_extraction_debugging 71.8% similar

function test_document_processor 70.3% similar

function test_llm_extraction 66.4% similar

function test_multiple_files

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function test_document_extractor 81.7% similar

function test_mixed_previous_reports 78.9% similar

function test_extraction_debugging 71.8% similar

function test_document_processor 70.3% similar

function test_llm_extraction 66.4% similar

✨ Improve Code: test_multiple_files

Code Comparison