🔍 Code Extractor

function merge_pdfs_v1

Maturity: 49

Merges multiple PDF files into a single output PDF file with robust error handling and fallback mechanisms.

File:
/tf/active/vicechatdev/msg_to_eml.py
Lines:
412 - 474
Complexity:
moderate

Purpose

This function combines multiple PDF files into one consolidated PDF document. It validates input files, filters out non-existent or empty files, and attempts to use PyMuPDF (fitz) as the primary merging library with PyPDF2 as a fallback. It handles edge cases like single file inputs (which are simply copied) and continues processing even if individual PDFs fail to merge.

Source Code

def merge_pdfs(input_paths, output_path):
    """Merge multiple PDF files with better error handling"""
    try:
        # Filter out non-existent files
        valid_paths = [path for path in input_paths if os.path.exists(path) and os.path.getsize(path) > 0]
        
        if not valid_paths:
            logger.error("No valid PDF files to merge")
            return None
            
        if len(valid_paths) == 1:
            # Just copy the single file
            shutil.copy2(valid_paths[0], output_path)
            return output_path
        
        # Try PyMuPDF first, as it's commonly used and more robust
        try:
            import fitz
            
            # Create output PDF
            output_pdf = fitz.open()
            
            # Add each input PDF
            for input_path in valid_paths:
                try:
                    pdf = fitz.open(input_path)
                    output_pdf.insert_pdf(pdf)
                except Exception as e:
                    logger.warning(f"Problem with PDF {input_path}: {str(e)}")
                    continue
            
            # Save merged PDF
            output_pdf.save(output_path)
            output_pdf.close()
            
            return output_path
            
        except ImportError:
            # Fall back to using PyPDF2
            try:
                from PyPDF2 import PdfMerger
                
                merger = PdfMerger()
                
                for input_path in valid_paths:
                    try:
                        merger.append(input_path)
                    except Exception as e:
                        logger.warning(f"Problem with PDF {input_path}: {str(e)}")
                        continue
                
                merger.write(output_path)
                merger.close()
                
                return output_path
            except ImportError:
                logger.error("No PDF merging library available. Install PyMuPDF or PyPDF2.")
                return None
            
    except Exception as e:
        logger.error(f"Error merging PDFs: {str(e)}")
        logger.error(traceback.format_exc())
        return None

Parameters

Name Type Default Kind
input_paths - - positional_or_keyword
output_path - - positional_or_keyword

Parameter Details

input_paths: A list or iterable of file paths (strings) pointing to PDF files to be merged. The function will filter out non-existent files and empty files (size 0 bytes) automatically. Order in the list determines the order in the merged output.

output_path: A string representing the file path where the merged PDF should be saved. Should include the filename and .pdf extension. The directory must exist or be writable.

Return Value

Returns the output_path (string) if the merge operation succeeds, or None if the operation fails (no valid input files, missing libraries, or other errors). The returned path confirms the location of the successfully created merged PDF.

Dependencies

  • os
  • shutil
  • traceback
  • fitz (PyMuPDF)
  • PyPDF2
  • logging

Required Imports

import os
import shutil
import traceback
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

import fitz

Condition: Primary PDF merging library (PyMuPDF). Used first if available. Install with: pip install PyMuPDF

Optional
from PyPDF2 import PdfMerger

Condition: Fallback PDF merging library. Used only if PyMuPDF (fitz) is not available. Install with: pip install PyPDF2

Optional

Usage Example

import os
import shutil
import traceback
import logging

# Setup logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Install dependencies first:
# pip install PyMuPDF
# or
# pip install PyPDF2

def merge_pdfs(input_paths, output_path):
    # ... (function code here)
    pass

# Example usage
input_files = ['document1.pdf', 'document2.pdf', 'document3.pdf']
output_file = 'merged_output.pdf'

result = merge_pdfs(input_files, output_file)

if result:
    print(f'Successfully merged PDFs to: {result}')
else:
    print('Failed to merge PDFs')

Best Practices

  • Ensure at least one PDF library (PyMuPDF or PyPDF2) is installed before calling this function
  • Always check the return value - None indicates failure, a path string indicates success
  • The function logs warnings for individual PDF failures but continues processing remaining files
  • Input files are validated automatically - non-existent or empty files are filtered out
  • For single file inputs, the function optimizes by copying instead of merging
  • PyMuPDF (fitz) is preferred over PyPDF2 for better robustness and performance
  • Ensure the logger object is properly configured in the calling scope
  • The output directory must exist before calling this function
  • Consider wrapping calls in try-except blocks for additional error handling at the application level

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function merge_pdfs 72.0% similar

    Merges multiple PDF files into a single consolidated PDF document by delegating to a PDFManipulator instance.

    From: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py
  • class DocumentMerger 69.8% similar

    A class that merges PDF documents with audit trail pages, combining an original PDF with an audit page and updating metadata to reflect the audit process.

    From: /tf/active/vicechatdev/document_auditor/src/document_merger.py
  • function test_enhanced_pdf_processing 51.0% similar

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    From: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py
  • function eml_to_pdf 50.2% similar

    Converts an .eml email file to PDF format, including the email body and all attachments merged into a single PDF document.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • class PDFConverter_v1 50.0% similar

    A comprehensive document-to-PDF converter class that handles multiple file formats (Word, Excel, PowerPoint, images) with multiple conversion methods and automatic fallbacks for reliability.

    From: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py
← Back to Browse