function export_to_docx
Exports a document with text and data sections to Microsoft Word DOCX format, preserving formatting, structure, and metadata.
/tf/active/vicechatdev/vice_ai/new_app.py
2955 - 3062
complex
Purpose
This function generates a DOCX file from a document object containing text sections (with Quill Delta/HTML content) and optional data sections. It handles document metadata (title, author, creation date, description), processes sections in order, converts rich text content from Quill Delta format through HTML to Markdown for proper formatting, and creates a structured Word document with headings, paragraphs, and data visualizations. The function is designed for document export functionality in a content management or document generation system.
Source Code
def export_to_docx(document, text_sections, data_sections=None):
"""Export document to DOCX format"""
if not DOCX_AVAILABLE:
raise ImportError("python-docx not available")
if data_sections is None:
data_sections = []
from docx import Document as DocxDocument # Rename to avoid conflict
doc = DocxDocument()
# Set document title
title = doc.add_heading(document.title, 0)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
# Add author and metadata
if document.owner:
author_para = doc.add_paragraph()
author_para.add_run(f"Author: {document.owner}").bold = True
author_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
date_para = doc.add_paragraph()
date_para.add_run(f"Created: {document.created_at.strftime('%Y-%m-%d %H:%M')}").italic = True
date_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
if document.description:
desc_para = doc.add_paragraph()
desc_para.add_run(f"Description: {document.description}").italic = True
desc_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
doc.add_paragraph() # Empty line
# Combine text and data sections in document order using document.sections
# Create lookup dicts for fast access
text_sections_dict = {section.id: section for section in text_sections}
data_sections_dict = {section.id: section for section in data_sections}
# Sort document sections by position and process in order
ordered_doc_sections = sorted(document.sections, key=lambda ds: ds.position)
logger.info(f"DOCX Export: Processing {len(ordered_doc_sections)} sections (text + data)...")
# Add sections in document order
for doc_section in ordered_doc_sections:
section = None
is_data_section = False
if doc_section.section_type == SectionType.TEXT and doc_section.section_id in text_sections_dict:
section = text_sections_dict[doc_section.section_id]
is_data_section = False
elif doc_section.section_type == SectionType.DATA and doc_section.section_id in data_sections_dict:
section = data_sections_dict[doc_section.section_id]
is_data_section = True
if not section:
continue
# Process data sections
if is_data_section:
add_data_section_to_docx(doc, section)
continue
# Process text sections
logger.info(f"DOCX: Processing text section: {section.section_type.value} - '{section.title}'")
if section.section_type.value == 'header':
# Add header
level = min(getattr(section, 'level', 1), 9) # Word supports up to 9 heading levels
logger.info(f"DOCX: Adding header level {level}: '{section.title}'")
heading = doc.add_heading(section.title, level)
elif section.section_type.value in ['text', 'content']:
# Add text content
logger.info(f"DOCX: Adding text section: '{section.title}'")
if section.title and section.title.strip():
doc.add_heading(section.title, 3)
if section.current_content:
# First convert from Quill Delta format to HTML
content_to_process = section.current_content
logger.info(f"DOCX: Raw content type: {type(content_to_process)}, first 100 chars: {str(content_to_process)[:100]}")
# Convert Quill Delta to HTML first
html_content = convert_quill_delta_to_html(content_to_process)
# Then convert HTML to Markdown for processing
markdown_content = html_to_markdown(html_content)
# Process markdown content for proper formatting
try:
elements = process_markdown_content(markdown_content)
add_formatted_content_to_word(doc, elements)
except Exception as e:
logger.warning(f"Error processing content for section {section.id}: {e}")
# Fallback to simple paragraph splitting
clean_content = clean_html_tags(html_content)
paragraphs = clean_content.split('\n\n')
for para_text in paragraphs:
if para_text.strip():
para = doc.add_paragraph()
para.add_run(para_text.strip())
# Save to BytesIO buffer
buffer = BytesIO()
doc.save(buffer)
buffer.seek(0)
return buffer.getvalue()
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
document |
- | - | positional_or_keyword |
text_sections |
- | - | positional_or_keyword |
data_sections |
- | None | positional_or_keyword |
Parameter Details
document: A Document model object containing document metadata (title, owner, created_at, description) and a sections attribute with ordered DocumentSection objects. Each DocumentSection has section_type (TEXT or DATA), section_id, and position for ordering.
text_sections: A list or iterable of TextSection model objects. Each TextSection should have attributes: id, section_type (with .value property), title, current_content (in Quill Delta format), and optionally level (for headers). These represent text-based content sections like headers, paragraphs, and formatted text.
data_sections: Optional list or iterable of DataSection model objects representing data visualizations, tables, or charts to be embedded in the document. Defaults to empty list if None. Each DataSection should have an id and be processable by add_data_section_to_docx function.
Return Value
Returns bytes object containing the complete DOCX file data. This binary data can be written to a file, sent as an HTTP response, or stored in a database. The DOCX includes formatted document with title, metadata, and all sections in proper order with appropriate styling.
Dependencies
python-docxiologging
Required Imports
from io import BytesIO
import logging
from docx.enum.text import WD_ALIGN_PARAGRAPH
Conditional/Optional Imports
These imports are only needed under specific conditions:
from docx import Document as DocxDocument
Condition: only if DOCX_AVAILABLE is True (checked at runtime)
Required (conditional)Usage Example
from models import Document, TextSection, DataSection, SectionType
from io import BytesIO
# Assume document, text_sections, and data_sections are already loaded from database
document = Document.query.get(doc_id)
text_sections = TextSection.query.filter_by(document_id=doc_id).all()
data_sections = DataSection.query.filter_by(document_id=doc_id).all()
# Export to DOCX
try:
docx_bytes = export_to_docx(document, text_sections, data_sections)
# Save to file
with open('output.docx', 'wb') as f:
f.write(docx_bytes)
# Or send as HTTP response
from flask import send_file
buffer = BytesIO(docx_bytes)
buffer.seek(0)
return send_file(buffer, as_attachment=True, download_name=f'{document.title}.docx', mimetype='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
except ImportError as e:
print('python-docx library not installed')
except Exception as e:
print(f'Export failed: {e}')
Best Practices
- Ensure DOCX_AVAILABLE is checked before calling this function to avoid ImportError
- The document object must have a valid sections attribute with DocumentSection objects that have position values for proper ordering
- TextSection.current_content should be in Quill Delta format (JSON structure) for proper conversion
- Handle the returned bytes appropriately - either write to file or stream to HTTP response
- The function uses multiple helper functions (convert_quill_delta_to_html, html_to_markdown, etc.) that must be available in the module scope
- Data sections require add_data_section_to_docx function to be implemented for proper rendering
- Header levels are clamped to 1-9 to comply with Word's heading level limits
- The function includes fallback error handling for content processing failures
- Logger should be configured to capture processing information and warnings
- Consider memory usage when exporting large documents with many sections or embedded data
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function export_to_docx_v1 90.0% similar
-
function export_to_pdf 75.9% similar
-
function export_to_pdf_v1 67.7% similar
-
function add_data_section_to_docx 66.3% similar
-
function export_document 65.9% similar