🔍 Code Extractor

function export_to_docx

Maturity: 50

Exports a document with text and data sections to Microsoft Word DOCX format, preserving formatting, structure, and metadata.

File:
/tf/active/vicechatdev/vice_ai/new_app.py
Lines:
2955 - 3062
Complexity:
complex

Purpose

This function generates a DOCX file from a document object containing text sections (with Quill Delta/HTML content) and optional data sections. It handles document metadata (title, author, creation date, description), processes sections in order, converts rich text content from Quill Delta format through HTML to Markdown for proper formatting, and creates a structured Word document with headings, paragraphs, and data visualizations. The function is designed for document export functionality in a content management or document generation system.

Source Code

def export_to_docx(document, text_sections, data_sections=None):
    """Export document to DOCX format"""
    if not DOCX_AVAILABLE:
        raise ImportError("python-docx not available")
    
    if data_sections is None:
        data_sections = []
    
    from docx import Document as DocxDocument  # Rename to avoid conflict
    
    doc = DocxDocument()
    
    # Set document title
    title = doc.add_heading(document.title, 0)
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    # Add author and metadata
    if document.owner:
        author_para = doc.add_paragraph()
        author_para.add_run(f"Author: {document.owner}").bold = True
        author_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    date_para = doc.add_paragraph()
    date_para.add_run(f"Created: {document.created_at.strftime('%Y-%m-%d %H:%M')}").italic = True
    date_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    if document.description:
        desc_para = doc.add_paragraph()
        desc_para.add_run(f"Description: {document.description}").italic = True
        desc_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    doc.add_paragraph()  # Empty line
    
    # Combine text and data sections in document order using document.sections
    # Create lookup dicts for fast access
    text_sections_dict = {section.id: section for section in text_sections}
    data_sections_dict = {section.id: section for section in data_sections}
    
    # Sort document sections by position and process in order
    ordered_doc_sections = sorted(document.sections, key=lambda ds: ds.position)
    
    logger.info(f"DOCX Export: Processing {len(ordered_doc_sections)} sections (text + data)...")
    
    # Add sections in document order
    for doc_section in ordered_doc_sections:
        section = None
        is_data_section = False
        
        if doc_section.section_type == SectionType.TEXT and doc_section.section_id in text_sections_dict:
            section = text_sections_dict[doc_section.section_id]
            is_data_section = False
        elif doc_section.section_type == SectionType.DATA and doc_section.section_id in data_sections_dict:
            section = data_sections_dict[doc_section.section_id]
            is_data_section = True
        
        if not section:
            continue
        
        # Process data sections
        if is_data_section:
            add_data_section_to_docx(doc, section)
            continue
        
        # Process text sections
        logger.info(f"DOCX: Processing text section: {section.section_type.value} - '{section.title}'")
        
        if section.section_type.value == 'header':
            # Add header
            level = min(getattr(section, 'level', 1), 9)  # Word supports up to 9 heading levels
            logger.info(f"DOCX: Adding header level {level}: '{section.title}'")
            heading = doc.add_heading(section.title, level)
            
        elif section.section_type.value in ['text', 'content']:
            # Add text content
            logger.info(f"DOCX: Adding text section: '{section.title}'")
            if section.title and section.title.strip():
                doc.add_heading(section.title, 3)
            
            if section.current_content:
                # First convert from Quill Delta format to HTML
                content_to_process = section.current_content
                logger.info(f"DOCX: Raw content type: {type(content_to_process)}, first 100 chars: {str(content_to_process)[:100]}")
                
                # Convert Quill Delta to HTML first
                html_content = convert_quill_delta_to_html(content_to_process)
                
                # Then convert HTML to Markdown for processing
                markdown_content = html_to_markdown(html_content)
                
                # Process markdown content for proper formatting
                try:
                    elements = process_markdown_content(markdown_content)
                    add_formatted_content_to_word(doc, elements)
                except Exception as e:
                    logger.warning(f"Error processing content for section {section.id}: {e}")
                    # Fallback to simple paragraph splitting
                    clean_content = clean_html_tags(html_content)
                    paragraphs = clean_content.split('\n\n')
                    for para_text in paragraphs:
                        if para_text.strip():
                            para = doc.add_paragraph()
                            para.add_run(para_text.strip())
    
    # Save to BytesIO buffer
    buffer = BytesIO()
    doc.save(buffer)
    buffer.seek(0)
    return buffer.getvalue()

Parameters

Name Type Default Kind
document - - positional_or_keyword
text_sections - - positional_or_keyword
data_sections - None positional_or_keyword

Parameter Details

document: A Document model object containing document metadata (title, owner, created_at, description) and a sections attribute with ordered DocumentSection objects. Each DocumentSection has section_type (TEXT or DATA), section_id, and position for ordering.

text_sections: A list or iterable of TextSection model objects. Each TextSection should have attributes: id, section_type (with .value property), title, current_content (in Quill Delta format), and optionally level (for headers). These represent text-based content sections like headers, paragraphs, and formatted text.

data_sections: Optional list or iterable of DataSection model objects representing data visualizations, tables, or charts to be embedded in the document. Defaults to empty list if None. Each DataSection should have an id and be processable by add_data_section_to_docx function.

Return Value

Returns bytes object containing the complete DOCX file data. This binary data can be written to a file, sent as an HTTP response, or stored in a database. The DOCX includes formatted document with title, metadata, and all sections in proper order with appropriate styling.

Dependencies

  • python-docx
  • io
  • logging

Required Imports

from io import BytesIO
import logging
from docx.enum.text import WD_ALIGN_PARAGRAPH

Conditional/Optional Imports

These imports are only needed under specific conditions:

from docx import Document as DocxDocument

Condition: only if DOCX_AVAILABLE is True (checked at runtime)

Required (conditional)

Usage Example

from models import Document, TextSection, DataSection, SectionType
from io import BytesIO

# Assume document, text_sections, and data_sections are already loaded from database
document = Document.query.get(doc_id)
text_sections = TextSection.query.filter_by(document_id=doc_id).all()
data_sections = DataSection.query.filter_by(document_id=doc_id).all()

# Export to DOCX
try:
    docx_bytes = export_to_docx(document, text_sections, data_sections)
    
    # Save to file
    with open('output.docx', 'wb') as f:
        f.write(docx_bytes)
    
    # Or send as HTTP response
    from flask import send_file
    buffer = BytesIO(docx_bytes)
    buffer.seek(0)
    return send_file(buffer, as_attachment=True, download_name=f'{document.title}.docx', mimetype='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
except ImportError as e:
    print('python-docx library not installed')
except Exception as e:
    print(f'Export failed: {e}')

Best Practices

  • Ensure DOCX_AVAILABLE is checked before calling this function to avoid ImportError
  • The document object must have a valid sections attribute with DocumentSection objects that have position values for proper ordering
  • TextSection.current_content should be in Quill Delta format (JSON structure) for proper conversion
  • Handle the returned bytes appropriately - either write to file or stream to HTTP response
  • The function uses multiple helper functions (convert_quill_delta_to_html, html_to_markdown, etc.) that must be available in the module scope
  • Data sections require add_data_section_to_docx function to be implemented for proper rendering
  • Header levels are clamped to 1-9 to comply with Word's heading level limits
  • The function includes fallback error handling for content processing failures
  • Logger should be configured to capture processing information and warnings
  • Consider memory usage when exporting large documents with many sections or embedded data

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function export_to_docx_v1 90.0% similar

    Exports a document object to Microsoft Word DOCX format, converting sections, content, and references into a formatted Word document with proper styling and structure.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function export_to_pdf 75.9% similar

    Exports a document with text and data sections to a PDF file using ReportLab, handling custom styling, section ordering, and content formatting including Quill Delta to HTML/Markdown conversion.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function export_to_pdf_v1 67.7% similar

    Converts a document object with sections and references into a formatted PDF file using ReportLab, supporting multiple heading levels, text content with markdown/HTML processing, and reference management.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function add_data_section_to_docx 66.3% similar

    Adds a data analysis section to a Word document, including analysis metadata, statistical conclusions, and embedded visualizations from saved content or legacy analysis history.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function export_document 65.9% similar

    Flask route handler that exports a document in either DOCX or PDF format, verifying user ownership and document access before generating the export file.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
← Back to Browse