function merge_pdfs_v1
Merges multiple PDF files into a single output PDF file with robust error handling and fallback mechanisms.
/tf/active/vicechatdev/msg_to_eml.py
412 - 474
moderate
Purpose
This function combines multiple PDF files into one consolidated PDF document. It validates input files, filters out non-existent or empty files, and attempts to use PyMuPDF (fitz) as the primary merging library with PyPDF2 as a fallback. It handles edge cases like single file inputs (which are simply copied) and continues processing even if individual PDFs fail to merge.
Source Code
def merge_pdfs(input_paths, output_path):
"""Merge multiple PDF files with better error handling"""
try:
# Filter out non-existent files
valid_paths = [path for path in input_paths if os.path.exists(path) and os.path.getsize(path) > 0]
if not valid_paths:
logger.error("No valid PDF files to merge")
return None
if len(valid_paths) == 1:
# Just copy the single file
shutil.copy2(valid_paths[0], output_path)
return output_path
# Try PyMuPDF first, as it's commonly used and more robust
try:
import fitz
# Create output PDF
output_pdf = fitz.open()
# Add each input PDF
for input_path in valid_paths:
try:
pdf = fitz.open(input_path)
output_pdf.insert_pdf(pdf)
except Exception as e:
logger.warning(f"Problem with PDF {input_path}: {str(e)}")
continue
# Save merged PDF
output_pdf.save(output_path)
output_pdf.close()
return output_path
except ImportError:
# Fall back to using PyPDF2
try:
from PyPDF2 import PdfMerger
merger = PdfMerger()
for input_path in valid_paths:
try:
merger.append(input_path)
except Exception as e:
logger.warning(f"Problem with PDF {input_path}: {str(e)}")
continue
merger.write(output_path)
merger.close()
return output_path
except ImportError:
logger.error("No PDF merging library available. Install PyMuPDF or PyPDF2.")
return None
except Exception as e:
logger.error(f"Error merging PDFs: {str(e)}")
logger.error(traceback.format_exc())
return None
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
input_paths |
- | - | positional_or_keyword |
output_path |
- | - | positional_or_keyword |
Parameter Details
input_paths: A list or iterable of file paths (strings) pointing to PDF files to be merged. The function will filter out non-existent files and empty files (size 0 bytes) automatically. Order in the list determines the order in the merged output.
output_path: A string representing the file path where the merged PDF should be saved. Should include the filename and .pdf extension. The directory must exist or be writable.
Return Value
Returns the output_path (string) if the merge operation succeeds, or None if the operation fails (no valid input files, missing libraries, or other errors). The returned path confirms the location of the successfully created merged PDF.
Dependencies
osshutiltracebackfitz (PyMuPDF)PyPDF2logging
Required Imports
import os
import shutil
import traceback
import logging
Conditional/Optional Imports
These imports are only needed under specific conditions:
import fitz
Condition: Primary PDF merging library (PyMuPDF). Used first if available. Install with: pip install PyMuPDF
Optionalfrom PyPDF2 import PdfMerger
Condition: Fallback PDF merging library. Used only if PyMuPDF (fitz) is not available. Install with: pip install PyPDF2
OptionalUsage Example
import os
import shutil
import traceback
import logging
# Setup logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Install dependencies first:
# pip install PyMuPDF
# or
# pip install PyPDF2
def merge_pdfs(input_paths, output_path):
# ... (function code here)
pass
# Example usage
input_files = ['document1.pdf', 'document2.pdf', 'document3.pdf']
output_file = 'merged_output.pdf'
result = merge_pdfs(input_files, output_file)
if result:
print(f'Successfully merged PDFs to: {result}')
else:
print('Failed to merge PDFs')
Best Practices
- Ensure at least one PDF library (PyMuPDF or PyPDF2) is installed before calling this function
- Always check the return value - None indicates failure, a path string indicates success
- The function logs warnings for individual PDF failures but continues processing remaining files
- Input files are validated automatically - non-existent or empty files are filtered out
- For single file inputs, the function optimizes by copying instead of merging
- PyMuPDF (fitz) is preferred over PyPDF2 for better robustness and performance
- Ensure the logger object is properly configured in the calling scope
- The output directory must exist before calling this function
- Consider wrapping calls in try-except blocks for additional error handling at the application level
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function merge_pdfs 72.0% similar
-
class DocumentMerger 69.8% similar
-
function test_enhanced_pdf_processing 51.0% similar
-
function eml_to_pdf 50.2% similar
-
class PDFConverter_v1 50.0% similar