function convert_to_pdf
Converts a document file to PDF format, automatically generating an output path if not specified.
/tf/active/vicechatdev/CDocs/utils/pdf_utils.py
2102 - 2124
simple
Purpose
This function provides a convenient wrapper for converting various document formats to PDF. It handles output path generation by replacing the input file's extension with .pdf when no output path is provided. The function delegates the actual conversion to a PDFConverter class instance, making it useful for batch processing, document workflows, or any scenario requiring PDF conversion.
Source Code
def convert_to_pdf(input_path: str, output_path: Optional[str] = None) -> str:
"""
Convert a document to PDF format
Parameters
----------
input_path : str
Path to the input document
output_path : str, optional
Path where the PDF will be saved. If not provided, generated from input path.
Returns
-------
str
Path to the generated PDF
"""
if output_path is None:
# Generate output path by replacing extension with .pdf
base_path = os.path.splitext(input_path)[0]
output_path = f"{base_path}.pdf"
converter = PDFConverter()
return converter.convert_to_pdf(input_path, output_path)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
input_path |
str | - | positional_or_keyword |
output_path |
Optional[str] | None | positional_or_keyword |
Parameter Details
input_path: String representing the file system path to the source document that needs to be converted to PDF. Should be a valid path to an existing file in a format supported by PDFConverter (e.g., .docx, .txt, .html). The path can be absolute or relative.
output_path: Optional string specifying where the converted PDF should be saved. If None (default), the function automatically generates an output path by taking the input file's base name and replacing its extension with .pdf. If provided, should be a valid file system path with write permissions.
Return Value
Type: str
Returns a string containing the file system path to the successfully generated PDF file. This will be either the user-provided output_path or the auto-generated path based on the input file name. The returned path can be used for further processing, verification, or logging purposes.
Dependencies
ostypingreportlabfitzpikepdfPILdocx2pdfpandas
Required Imports
import os
from typing import Optional
Conditional/Optional Imports
These imports are only needed under specific conditions:
from reportlab.lib import colors
Condition: Required by PDFConverter class for PDF styling and formatting operations
Required (conditional)from reportlab.lib.pagesizes import letter, A4
Condition: Required by PDFConverter class for page size definitions
Required (conditional)from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
Condition: Required by PDFConverter class for text styling
Required (conditional)from reportlab.lib.units import inch, cm
Condition: Required by PDFConverter class for measurement units
Required (conditional)from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle, Image, PageBreak, Flowable
Condition: Required by PDFConverter class for PDF document construction
Required (conditional)from reportlab.pdfbase import pdfmetrics
Condition: Required by PDFConverter class for font metrics
Required (conditional)from reportlab.pdfbase.ttfonts import TTFont
Condition: Required by PDFConverter class for custom font support
Required (conditional)import fitz
Condition: Required by PDFConverter class for PDF manipulation (PyMuPDF library)
Required (conditional)import pikepdf
Condition: Required by PDFConverter class for PDF processing operations
Required (conditional)from PIL import Image as PILImage
Condition: Required by PDFConverter class for image processing
Required (conditional)from docx2pdf import convert
Condition: Required by PDFConverter class for Word document conversion
Required (conditional)import pandas as pd
Condition: Required by PDFConverter class for data table processing
Required (conditional)Usage Example
import os
from typing import Optional
# Assuming PDFConverter class is available
# Basic usage with auto-generated output path
pdf_path = convert_to_pdf('document.docx')
print(f'PDF created at: {pdf_path}')
# Usage with explicit output path
pdf_path = convert_to_pdf(
input_path='reports/annual_report.docx',
output_path='output/annual_report_2024.pdf'
)
print(f'PDF saved to: {pdf_path}')
# Batch conversion example
input_files = ['doc1.docx', 'doc2.txt', 'doc3.html']
for file in input_files:
try:
result = convert_to_pdf(file)
print(f'Successfully converted {file} to {result}')
except Exception as e:
print(f'Failed to convert {file}: {e}')
Best Practices
- Always verify that the input file exists before calling this function to avoid errors
- Ensure the output directory exists and has write permissions before conversion
- Handle exceptions appropriately as the underlying PDFConverter may raise errors for unsupported formats or corrupted files
- Consider validating the input file format before conversion to provide better error messages
- When processing multiple files, implement proper error handling to prevent one failure from stopping the entire batch
- Be aware that auto-generated output paths will overwrite existing files with the same name
- For production use, consider adding file existence checks and backup mechanisms
- The function depends on PDFConverter class implementation - ensure it supports your required document formats
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function extract_text_from_pdf 63.1% similar
-
function convert_document_to_pdf 61.6% similar
-
function convert_document_to_pdf_v1 60.9% similar
-
class PDFConverter 58.9% similar
-
function merge_pdfs 58.4% similar