function test_multiple_files
A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.
/tf/active/vicechatdev/leexi/test_multiple_files.py
15 - 71
moderate
Purpose
This function serves as a test harness for the DocumentExtractor class, specifically testing its ability to extract text from multiple files (markdown and text formats). It verifies file existence, extracts content from each file individually, displays extraction statistics, and simulates how the extracted content would be combined for LLM processing. The function provides detailed console output for debugging and validation purposes.
Source Code
def test_multiple_files():
"""Test the previous reports extraction with multiple files"""
# Initialize extractor
extractor = DocumentExtractor()
print("Multiple Files Previous Reports Test")
print("=" * 50)
# Test files
test_files = [
"test_files/previous_report_1.md",
"test_files/previous_report_2.txt"
]
# Check if test files exist
for file_path in test_files:
if not os.path.exists(file_path):
print(f"Test file not found: {file_path}")
return
print(f"Testing with {len(test_files)} files:")
for file_path in test_files:
print(f" - {file_path}")
print()
# Test extraction from each file
extracted_contents = []
for file_path in test_files:
print(f"Extracting from: {file_path}")
try:
content = extractor.extract_text(file_path)
if content:
print(f"ā Successfully extracted {len(content)} characters")
extracted_contents.append(content)
print(f"Preview: {content[:100]}...")
else:
print("ā No content extracted")
except Exception as e:
print(f"ā Error: {str(e)}")
print("-" * 40)
# Simulate the combined extraction process
if extracted_contents:
print("\nSimulating combined extraction for LLM:")
combined_content = []
for i, content in enumerate(extracted_contents):
file_name = Path(test_files[i]).name
combined_content.append(f"=== {file_name} ===\n{content}\n")
full_content = "\n".join(combined_content)
print(f"Total combined content: {len(full_content)} characters")
print(f"Combined preview:\n{full_content[:500]}...")
print("\nā Multiple file extraction simulation successful!")
else:
print("\nā No content extracted from any files")
Return Value
This function does not return any value (implicitly returns None). It performs side effects by printing test results to the console and may exit early if test files are not found.
Dependencies
ossyspathlibdocument_extractor
Required Imports
import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor
Usage Example
# Ensure test files exist in the correct location
# test_files/previous_report_1.md
# test_files/previous_report_2.txt
import os
import sys
from pathlib import Path
from document_extractor import DocumentExtractor
# Run the test
test_multiple_files()
# Expected output:
# Multiple Files Previous Reports Test
# ==================================================
# Testing with 2 files:
# - test_files/previous_report_1.md
# - test_files/previous_report_2.txt
#
# Extracting from: test_files/previous_report_1.md
# ā Successfully extracted X characters
# ...
Best Practices
- Ensure test files exist before running the function to avoid early termination
- The function expects specific file paths ('test_files/previous_report_1.md' and 'test_files/previous_report_2.txt') - modify the test_files list if using different paths
- This is a test function and should not be used in production code; it's designed for development and validation purposes
- The function prints directly to console - redirect stdout if you need to capture output programmatically
- Error handling is basic (try-except with print) - consider enhancing for production use
- The function returns early if files don't exist, which may not be ideal for comprehensive test suites
- Consider using a proper testing framework (pytest, unittest) instead of standalone test functions for better integration and reporting
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_document_extractor 81.7% similar
-
function test_mixed_previous_reports 78.9% similar
-
function test_extraction_debugging 71.8% similar
-
function test_document_processor 70.3% similar
-
function test_llm_extraction 66.4% similar