🔍 Code Extractor

function test_libreoffice_conversion

Maturity: 48

Tests LibreOffice's ability to convert a document file to PDF format using headless mode, with timeout protection and comprehensive error reporting.

File:
/tf/active/vicechatdev/docchat/test_problematic_files.py
Lines:
99 - 159
Complexity:
moderate

Purpose

This function validates LibreOffice conversion functionality by attempting to convert a given document file (e.g., .docx, .pptx, .odt) to PDF format. It creates a temporary directory for output, executes the LibreOffice conversion command with a 60-second timeout, captures stdout/stderr, and reports success or failure with detailed diagnostic information. Useful for testing document processing pipelines, validating LibreOffice installation, or debugging conversion issues.

Source Code

def test_libreoffice_conversion(file_path):
    """Test LibreOffice conversion"""
    print(f"\n{'='*80}")
    print(f"Testing LibreOffice Conversion: {Path(file_path).name}")
    print(f"{'='*80}")
    
    try:
        import subprocess
        import tempfile
        
        file_path_obj = Path(file_path)
        
        if not file_path_obj.exists():
            print(f"❌ File does not exist!")
            return False
            
        # Create temp directory
        with tempfile.TemporaryDirectory() as temp_dir:
            temp_path = Path(temp_dir)
            output_pdf = temp_path / f"{file_path_obj.stem}.pdf"
            
            print(f"Converting to PDF in {temp_dir}...")
            cmd = [
                'libreoffice',
                '--headless',
                '--convert-to', 'pdf',
                '--outdir', str(temp_path),
                str(file_path_obj)
            ]
            
            print(f"Command: {' '.join(cmd)}")
            
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=60  # Use shorter timeout for testing
            )
            
            print(f"\nReturn code: {result.returncode}")
            if result.stdout:
                print(f"STDOUT:\n{result.stdout}")
            if result.stderr:
                print(f"STDERR:\n{result.stderr}")
                
            if result.returncode == 0 and output_pdf.exists():
                pdf_size = output_pdf.stat().st_size
                print(f"✓ Successfully converted to PDF ({pdf_size:,} bytes)")
                return True
            else:
                print(f"❌ Conversion failed")
                return False
                
    except subprocess.TimeoutExpired:
        print(f"❌ Conversion timed out after 60 seconds")
        return False
    except Exception as e:
        print(f"❌ Error: {type(e).__name__}: {e}")
        print(f"\nFull traceback:")
        traceback.print_exc()
        return False

Parameters

Name Type Default Kind
file_path - - positional_or_keyword

Parameter Details

file_path: Path to the document file to be converted. Can be a string or Path object. The file must exist and should be in a format supported by LibreOffice (e.g., .doc, .docx, .ppt, .pptx, .odt, .ods). The function will validate file existence before attempting conversion.

Return Value

Returns a boolean value: True if the conversion succeeded (return code 0 and output PDF exists with non-zero size), False if conversion failed, timed out, or an exception occurred. The function provides detailed console output about the conversion process regardless of success or failure.

Dependencies

  • pathlib
  • subprocess
  • tempfile
  • traceback
  • pptx
  • python-docx

Required Imports

from pathlib import Path
import subprocess
import tempfile
import traceback

Conditional/Optional Imports

These imports are only needed under specific conditions:

import pptx

Condition: imported in source file but not used in this function

Optional
from docx import Document as DocxDocument

Condition: imported in source file but not used in this function

Optional

Usage Example

from pathlib import Path
import subprocess
import tempfile
import traceback

def test_libreoffice_conversion(file_path):
    # ... (function code as provided) ...
    pass

# Test with a Word document
result = test_libreoffice_conversion('/path/to/document.docx')
if result:
    print('Conversion test passed')
else:
    print('Conversion test failed')

# Test with a PowerPoint file
result = test_libreoffice_conversion('/path/to/presentation.pptx')

# Test with Path object
from pathlib import Path
file = Path('/path/to/spreadsheet.ods')
result = test_libreoffice_conversion(file)

Best Practices

  • Ensure LibreOffice is installed before calling this function
  • The function uses a 60-second timeout; adjust if converting large or complex documents
  • The function creates temporary directories that are automatically cleaned up
  • Check the console output for detailed diagnostic information when debugging failures
  • The function validates file existence before attempting conversion
  • Consider wrapping calls in try-except blocks if using in production code
  • The function prints extensive output; redirect stdout/stderr if running in automated tests
  • Ensure the input file is not locked or in use by another process
  • The function works with any LibreOffice-supported format (docx, pptx, odt, ods, etc.)

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class DocumentConverter 71.1% similar

    A class that converts various document formats (Word, Excel, PowerPoint, OpenDocument, Visio) to PDF using LibreOffice's headless conversion capabilities, with support for parallel processing and directory structure preservation.

    From: /tf/active/vicechatdev/pdfconverter.py
  • function main_v64 64.1% similar

    A test harness function that validates the ability to open and process PowerPoint and Word document files, with fallback to LibreOffice conversion for problematic files.

    From: /tf/active/vicechatdev/docchat/test_problematic_files.py
  • class PDFConverter 63.4% similar

    A class that converts various document formats (Word, PowerPoint, Excel, images) to PDF format using LibreOffice and ReportLab libraries.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function test_document_processing 62.8% similar

    A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py
  • class DocumentConverter_v1 61.7% similar

    A class that converts various document formats (Word, Excel, PowerPoint, images) to PDF format using LibreOffice, unoconv, or PIL.

    From: /tf/active/vicechatdev/document_auditor/src/document_converter.py
← Back to Browse