function export_to_pdf_v1
Converts a document object with sections and references into a formatted PDF file using ReportLab, supporting multiple heading levels, text content with markdown/HTML processing, and reference management.
/tf/active/vicechatdev/vice_ai/complex_app.py
2112 - 2243
complex
Purpose
This function generates a professionally formatted PDF document from a structured document object. It handles document metadata (title, author, creation date), processes multiple section types (headers and text), converts HTML/Markdown content to PDF-compatible format, manages hierarchical heading styles, and compiles references. The function is designed for document export functionality in web applications or content management systems.
Source Code
def export_to_pdf(document):
"""Export document to PDF format"""
if not PDF_AVAILABLE:
raise ImportError("reportlab not available")
buffer = BytesIO()
doc = SimpleDocTemplate(buffer, pagesize=A4)
styles = getSampleStyleSheet()
story = []
# Custom styles
title_style = ParagraphStyle(
'CustomTitle',
parent=styles['Title'],
fontSize=24,
spaceAfter=30,
alignment=1 # Center
)
heading1_style = ParagraphStyle(
'CustomHeading1',
parent=styles['Heading1'],
fontSize=18,
spaceAfter=12,
spaceBefore=20
)
heading2_style = ParagraphStyle(
'CustomHeading2',
parent=styles['Heading2'],
fontSize=16,
spaceAfter=10,
spaceBefore=15
)
heading3_style = ParagraphStyle(
'CustomHeading3',
parent=styles['Heading3'],
fontSize=14,
spaceAfter=8,
spaceBefore=12
)
# Add custom styles to styles dictionary for use in add_formatted_content_to_pdf
styles.add(heading1_style)
styles.add(heading2_style)
styles.add(heading3_style)
# Document title
story.append(Paragraph(document.title, title_style))
# Author and metadata
if document.author:
story.append(Paragraph(f"<b>Author:</b> {document.author}", styles['Normal']))
story.append(Paragraph(f"<i>Created: {document.created_at.strftime('%Y-%m-%d %H:%M')}</i>", styles['Normal']))
story.append(Spacer(1, 20))
# Add sections
for section in document.sections:
if section.type == 'header':
# Choose heading style based on level
if section.level == 1:
style = heading1_style
elif section.level == 2:
style = heading2_style
else:
style = heading3_style
story.append(Paragraph(section.title, style))
elif section.type == 'text':
# Add section title if present
if section.title:
story.append(Paragraph(section.title, heading3_style))
# Add content
if section.content:
# Check if content is HTML or Markdown and process accordingly
content_to_process = section.content
# If content looks like HTML, convert to Markdown first
if '<' in content_to_process and '>' in content_to_process:
# Content appears to be HTML, convert to Markdown
content_to_process = html_to_markdown(content_to_process)
try:
# Process markdown content for proper formatting
elements = process_markdown_content(content_to_process)
add_formatted_content_to_pdf(story, elements, styles)
except Exception as e:
logger.warning(f"Error processing content for section {section.id}: {e}")
# Fallback to simple paragraph splitting with basic formatting
# Clean HTML tags if present for fallback
clean_content = clean_html_tags(content_to_process)
paragraphs = clean_content.split('\n\n')
for para_text in paragraphs:
if para_text.strip():
story.append(Paragraph(para_text.strip(), styles['Normal']))
story.append(Spacer(1, 6))
# Add section references
if section.references:
story.append(Paragraph("<b>References for this section:</b>", styles['Normal']))
for i, ref in enumerate(section.references, 1):
ref_text = f"[{i}] {ref.get('title', 'Untitled Reference')}"
story.append(Paragraph(ref_text, styles['Normal']))
story.append(Spacer(1, 12))
# Add global references
all_references = document.get_all_references()
if all_references:
story.append(PageBreak())
story.append(Paragraph("References", heading1_style))
unique_refs = {}
ref_counter = 1
for ref in all_references:
ref_key = ref.get('title', 'Untitled') + ref.get('source', '')
if ref_key not in unique_refs:
unique_refs[ref_key] = ref_counter
ref_text = f"[{ref_counter}] {ref.get('title', 'Untitled Reference')}"
if ref.get('source'):
ref_text += f" ({ref['source']})"
story.append(Paragraph(ref_text, styles['Normal']))
ref_counter += 1
# Build PDF
doc.build(story)
buffer.seek(0)
return buffer.getvalue()
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
document |
- | - | positional_or_keyword |
Parameter Details
document: A document object that must have the following attributes: 'title' (string), 'author' (string or None), 'created_at' (datetime object), 'sections' (list of section objects with 'type', 'level', 'title', 'content', and 'references' attributes), and a 'get_all_references()' method that returns a list of reference dictionaries with 'title' and 'source' keys. Each section object should have 'type' ('header' or 'text'), 'level' (1-3 for headers), 'title' (string), 'content' (string with HTML or Markdown), and 'references' (list of reference dictionaries).
Return Value
Returns bytes representing the complete PDF file content. This binary data can be written directly to a file, sent as an HTTP response, or stored in a BytesIO buffer. The PDF includes formatted title, metadata, all document sections with appropriate styling, and a references section if applicable.
Dependencies
reportlabflask
Required Imports
from io import BytesIO
from reportlab.lib.pagesizes import A4
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
Conditional/Optional Imports
These imports are only needed under specific conditions:
from reportlab.lib.pagesizes import A4
Condition: ReportLab must be installed and PDF_AVAILABLE flag must be True
Required (conditional)html_to_markdown function
Condition: Required if document sections contain HTML content that needs conversion
Optionalprocess_markdown_content function
Condition: Required for processing markdown content into PDF elements
Optionaladd_formatted_content_to_pdf function
Condition: Required for adding processed markdown elements to PDF story
Optionalclean_html_tags function
Condition: Used as fallback when markdown processing fails
Optionallogger object
Condition: Required for logging warnings during content processing errors
OptionalUsage Example
from io import BytesIO
from datetime import datetime
from reportlab.lib.pagesizes import A4
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
# Assume PDF_AVAILABLE = True and helper functions are defined
class Section:
def __init__(self, type, level, title, content, references):
self.type = type
self.level = level
self.title = title
self.content = content
self.references = references
self.id = 1
class Document:
def __init__(self, title, author, created_at, sections):
self.title = title
self.author = author
self.created_at = created_at
self.sections = sections
def get_all_references(self):
refs = []
for section in self.sections:
if section.references:
refs.extend(section.references)
return refs
# Create document
sections = [
Section('header', 1, 'Introduction', '', []),
Section('text', None, 'Overview', 'This is the introduction text.', [{'title': 'Reference 1', 'source': 'Source A'}])
]
doc = Document('My Report', 'John Doe', datetime.now(), sections)
# Export to PDF
pdf_bytes = export_to_pdf(doc)
# Save to file
with open('output.pdf', 'wb') as f:
f.write(pdf_bytes)
Best Practices
- Ensure the PDF_AVAILABLE flag is properly set before calling this function to avoid ImportError
- The document object must have all required attributes (title, author, created_at, sections) and the get_all_references() method
- Section content can be HTML or Markdown; the function attempts to detect and convert appropriately
- Helper functions (html_to_markdown, process_markdown_content, add_formatted_content_to_pdf, clean_html_tags) must be defined in the same module or imported
- The function includes error handling for content processing failures with fallback to simple paragraph formatting
- References are deduplicated in the final references section based on title and source combination
- Use A4 page size by default; modify the pagesize parameter in SimpleDocTemplate if different size is needed
- The returned bytes can be directly written to a file or sent as an HTTP response with appropriate content-type header (application/pdf)
- Consider memory usage for large documents as the entire PDF is built in memory before returning
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function export_to_pdf 85.3% similar
-
function add_formatted_content_to_pdf_v1 73.2% similar
-
function html_to_pdf 72.4% similar
-
function add_formatted_content_to_pdf 72.0% similar
-
function export_to_docx_v1 70.8% similar