function test_libreoffice_conversion
Tests LibreOffice's ability to convert a document file to PDF format using headless mode, with timeout protection and comprehensive error reporting.
/tf/active/vicechatdev/docchat/test_problematic_files.py
99 - 159
moderate
Purpose
This function validates LibreOffice conversion functionality by attempting to convert a given document file (e.g., .docx, .pptx, .odt) to PDF format. It creates a temporary directory for output, executes the LibreOffice conversion command with a 60-second timeout, captures stdout/stderr, and reports success or failure with detailed diagnostic information. Useful for testing document processing pipelines, validating LibreOffice installation, or debugging conversion issues.
Source Code
def test_libreoffice_conversion(file_path):
"""Test LibreOffice conversion"""
print(f"\n{'='*80}")
print(f"Testing LibreOffice Conversion: {Path(file_path).name}")
print(f"{'='*80}")
try:
import subprocess
import tempfile
file_path_obj = Path(file_path)
if not file_path_obj.exists():
print(f"❌ File does not exist!")
return False
# Create temp directory
with tempfile.TemporaryDirectory() as temp_dir:
temp_path = Path(temp_dir)
output_pdf = temp_path / f"{file_path_obj.stem}.pdf"
print(f"Converting to PDF in {temp_dir}...")
cmd = [
'libreoffice',
'--headless',
'--convert-to', 'pdf',
'--outdir', str(temp_path),
str(file_path_obj)
]
print(f"Command: {' '.join(cmd)}")
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=60 # Use shorter timeout for testing
)
print(f"\nReturn code: {result.returncode}")
if result.stdout:
print(f"STDOUT:\n{result.stdout}")
if result.stderr:
print(f"STDERR:\n{result.stderr}")
if result.returncode == 0 and output_pdf.exists():
pdf_size = output_pdf.stat().st_size
print(f"✓ Successfully converted to PDF ({pdf_size:,} bytes)")
return True
else:
print(f"❌ Conversion failed")
return False
except subprocess.TimeoutExpired:
print(f"❌ Conversion timed out after 60 seconds")
return False
except Exception as e:
print(f"❌ Error: {type(e).__name__}: {e}")
print(f"\nFull traceback:")
traceback.print_exc()
return False
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
file_path |
- | - | positional_or_keyword |
Parameter Details
file_path: Path to the document file to be converted. Can be a string or Path object. The file must exist and should be in a format supported by LibreOffice (e.g., .doc, .docx, .ppt, .pptx, .odt, .ods). The function will validate file existence before attempting conversion.
Return Value
Returns a boolean value: True if the conversion succeeded (return code 0 and output PDF exists with non-zero size), False if conversion failed, timed out, or an exception occurred. The function provides detailed console output about the conversion process regardless of success or failure.
Dependencies
pathlibsubprocesstempfiletracebackpptxpython-docx
Required Imports
from pathlib import Path
import subprocess
import tempfile
import traceback
Conditional/Optional Imports
These imports are only needed under specific conditions:
import pptx
Condition: imported in source file but not used in this function
Optionalfrom docx import Document as DocxDocument
Condition: imported in source file but not used in this function
OptionalUsage Example
from pathlib import Path
import subprocess
import tempfile
import traceback
def test_libreoffice_conversion(file_path):
# ... (function code as provided) ...
pass
# Test with a Word document
result = test_libreoffice_conversion('/path/to/document.docx')
if result:
print('Conversion test passed')
else:
print('Conversion test failed')
# Test with a PowerPoint file
result = test_libreoffice_conversion('/path/to/presentation.pptx')
# Test with Path object
from pathlib import Path
file = Path('/path/to/spreadsheet.ods')
result = test_libreoffice_conversion(file)
Best Practices
- Ensure LibreOffice is installed before calling this function
- The function uses a 60-second timeout; adjust if converting large or complex documents
- The function creates temporary directories that are automatically cleaned up
- Check the console output for detailed diagnostic information when debugging failures
- The function validates file existence before attempting conversion
- Consider wrapping calls in try-except blocks if using in production code
- The function prints extensive output; redirect stdout/stderr if running in automated tests
- Ensure the input file is not locked or in use by another process
- The function works with any LibreOffice-supported format (docx, pptx, odt, ods, etc.)
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class DocumentConverter 71.1% similar
-
function main_v64 64.1% similar
-
class PDFConverter 63.4% similar
-
function test_document_processing 62.8% similar
-
class DocumentConverter_v1 61.7% similar