🔍 Code Extractor

function convert_to_pdf

Maturity: 60

Converts a document file to PDF format, automatically generating an output path if not specified.

File:
/tf/active/vicechatdev/CDocs/utils/pdf_utils.py
Lines:
2102 - 2124
Complexity:
simple

Purpose

This function provides a convenient wrapper for converting various document formats to PDF. It handles output path generation by replacing the input file's extension with .pdf when no output path is provided. The function delegates the actual conversion to a PDFConverter class instance, making it useful for batch processing, document workflows, or any scenario requiring PDF conversion.

Source Code

def convert_to_pdf(input_path: str, output_path: Optional[str] = None) -> str:
    """
    Convert a document to PDF format
    
    Parameters
    ----------
    input_path : str
        Path to the input document
    output_path : str, optional
        Path where the PDF will be saved. If not provided, generated from input path.
        
    Returns
    -------
    str
        Path to the generated PDF
    """
    if output_path is None:
        # Generate output path by replacing extension with .pdf
        base_path = os.path.splitext(input_path)[0]
        output_path = f"{base_path}.pdf"
    
    converter = PDFConverter()
    return converter.convert_to_pdf(input_path, output_path)

Parameters

Name Type Default Kind
input_path str - positional_or_keyword
output_path Optional[str] None positional_or_keyword

Parameter Details

input_path: String representing the file system path to the source document that needs to be converted to PDF. Should be a valid path to an existing file in a format supported by PDFConverter (e.g., .docx, .txt, .html). The path can be absolute or relative.

output_path: Optional string specifying where the converted PDF should be saved. If None (default), the function automatically generates an output path by taking the input file's base name and replacing its extension with .pdf. If provided, should be a valid file system path with write permissions.

Return Value

Type: str

Returns a string containing the file system path to the successfully generated PDF file. This will be either the user-provided output_path or the auto-generated path based on the input file name. The returned path can be used for further processing, verification, or logging purposes.

Dependencies

  • os
  • typing
  • reportlab
  • fitz
  • pikepdf
  • PIL
  • docx2pdf
  • pandas

Required Imports

import os
from typing import Optional

Conditional/Optional Imports

These imports are only needed under specific conditions:

from reportlab.lib import colors

Condition: Required by PDFConverter class for PDF styling and formatting operations

Required (conditional)
from reportlab.lib.pagesizes import letter, A4

Condition: Required by PDFConverter class for page size definitions

Required (conditional)
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle

Condition: Required by PDFConverter class for text styling

Required (conditional)
from reportlab.lib.units import inch, cm

Condition: Required by PDFConverter class for measurement units

Required (conditional)
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle, Image, PageBreak, Flowable

Condition: Required by PDFConverter class for PDF document construction

Required (conditional)
from reportlab.pdfbase import pdfmetrics

Condition: Required by PDFConverter class for font metrics

Required (conditional)
from reportlab.pdfbase.ttfonts import TTFont

Condition: Required by PDFConverter class for custom font support

Required (conditional)
import fitz

Condition: Required by PDFConverter class for PDF manipulation (PyMuPDF library)

Required (conditional)
import pikepdf

Condition: Required by PDFConverter class for PDF processing operations

Required (conditional)
from PIL import Image as PILImage

Condition: Required by PDFConverter class for image processing

Required (conditional)
from docx2pdf import convert

Condition: Required by PDFConverter class for Word document conversion

Required (conditional)
import pandas as pd

Condition: Required by PDFConverter class for data table processing

Required (conditional)

Usage Example

import os
from typing import Optional

# Assuming PDFConverter class is available
# Basic usage with auto-generated output path
pdf_path = convert_to_pdf('document.docx')
print(f'PDF created at: {pdf_path}')

# Usage with explicit output path
pdf_path = convert_to_pdf(
    input_path='reports/annual_report.docx',
    output_path='output/annual_report_2024.pdf'
)
print(f'PDF saved to: {pdf_path}')

# Batch conversion example
input_files = ['doc1.docx', 'doc2.txt', 'doc3.html']
for file in input_files:
    try:
        result = convert_to_pdf(file)
        print(f'Successfully converted {file} to {result}')
    except Exception as e:
        print(f'Failed to convert {file}: {e}')

Best Practices

  • Always verify that the input file exists before calling this function to avoid errors
  • Ensure the output directory exists and has write permissions before conversion
  • Handle exceptions appropriately as the underlying PDFConverter may raise errors for unsupported formats or corrupted files
  • Consider validating the input file format before conversion to provide better error messages
  • When processing multiple files, implement proper error handling to prevent one failure from stopping the entire batch
  • Be aware that auto-generated output paths will overwrite existing files with the same name
  • For production use, consider adding file existence checks and backup mechanisms
  • The function depends on PDFConverter class implementation - ensure it supports your required document formats

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function extract_text_from_pdf 63.1% similar

    Extracts all text content from a PDF document and returns it as a string.

    From: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py
  • function convert_document_to_pdf 61.6% similar

    Converts a document version to PDF format with audit trail, signatures, watermarks, and PDF/A compliance options, then uploads the result to FileCloud storage.

    From: /tf/active/vicechatdev/CDocs/controllers/document_controller.py
  • function convert_document_to_pdf_v1 60.9% similar

    Converts a document version from an editable format (e.g., Word) to PDF without changing the document's status, uploading the result to FileCloud and updating the version record.

    From: /tf/active/vicechatdev/document_controller_backup.py
  • class PDFConverter 58.9% similar

    A class that converts various document formats (Word, PowerPoint, Excel, images) to PDF format using LibreOffice and ReportLab libraries.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function merge_pdfs 58.4% similar

    Merges multiple PDF files into a single consolidated PDF document by delegating to a PDFManipulator instance.

    From: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py
← Back to Browse