export_to_docx_v1 - Code Extractor

function export_to_docx_v1

Maturity: 48

Exports a document object to Microsoft Word DOCX format, converting sections, content, and references into a formatted Word document with proper styling and structure.

File:
/tf/active/vicechatdev/vice_ai/complex_app.py

Lines:
2017 - 2110

Complexity:
complex

Purpose

This function takes a document object containing title, author, sections, and references, and generates a properly formatted DOCX file. It handles document metadata, hierarchical sections with headers, text content (including HTML and Markdown conversion), and reference management. The function supports content formatting, heading levels, and creates a comprehensive bibliography section. It's designed for document export functionality in web applications or content management systems.

Source Code

def export_to_docx(document):
    """Export document to DOCX format"""
    if not DOCX_AVAILABLE:
        raise ImportError("python-docx not available")
    
    doc = Document()
    
    # Set document title
    title = doc.add_heading(document.title, 0)
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    # Add author and metadata
    if document.author:
        author_para = doc.add_paragraph()
        author_para.add_run(f"Author: {document.author}").bold = True
        author_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    date_para = doc.add_paragraph()
    date_para.add_run(f"Created: {document.created_at.strftime('%Y-%m-%d %H:%M')}").italic = True
    date_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    doc.add_paragraph()  # Empty line
    
    # Add sections
    for section in document.sections:
        if section.type == 'header':
            # Add header
            level = min(section.level, 9)  # Word supports up to 9 heading levels
            heading = doc.add_heading(section.title, level)
            
        elif section.type == 'text':
            # Add text content
            if section.title:
                doc.add_heading(section.title, 3)
            
            if section.content:
                # Check if content is HTML or Markdown and process accordingly
                content_to_process = section.content
                
                # If content looks like HTML, convert to Markdown first
                if '<' in content_to_process and '>' in content_to_process:
                    # Content appears to be HTML, convert to Markdown
                    content_to_process = html_to_markdown(content_to_process)
                
                # Process markdown content for proper formatting
                try:
                    elements = process_markdown_content(content_to_process)
                    add_formatted_content_to_word(doc, elements)
                except Exception as e:
                    logger.warning(f"Error processing content for section {section.id}: {e}")
                    # Fallback to simple paragraph splitting
                    # Clean HTML tags if present for fallback
                    clean_content = clean_html_tags(content_to_process)
                    paragraphs = clean_content.split('\n\n')
                    for para_text in paragraphs:
                        if para_text.strip():
                            para = doc.add_paragraph()
                            para.add_run(para_text.strip())
            
            # Add section references if any
            if section.references:
                ref_heading = doc.add_heading("References for this section:", 4)
                for i, ref in enumerate(section.references, 1):
                    ref_para = doc.add_paragraph()
                    ref_para.add_run(f"[{i}] ").bold = True
                    ref_para.add_run(ref.get('title', 'Untitled Reference'))
                
                doc.add_paragraph()  # Empty line after references
    
    # Add global references
    all_references = document.get_all_references()
    if all_references:
        doc.add_page_break()
        doc.add_heading("References", 1)
        
        unique_refs = {}
        ref_counter = 1
        
        for ref in all_references:
            ref_key = ref.get('title', 'Untitled') + ref.get('source', '')
            if ref_key not in unique_refs:
                unique_refs[ref_key] = ref_counter
                ref_para = doc.add_paragraph()
                ref_para.add_run(f"[{ref_counter}] ").bold = True
                ref_para.add_run(ref.get('title', 'Untitled Reference'))
                if ref.get('source'):
                    ref_para.add_run(f" ({ref['source']})")
                ref_counter += 1
    
    # Save to BytesIO
    buffer = BytesIO()
    doc.save(buffer)
    buffer.seek(0)
    return buffer.getvalue()

Parameters

Name	Type	Default	Kind
`document`	-	-	positional_or_keyword

Parameter Details

document: A document object that must have the following attributes: 'title' (string), 'author' (string, optional), 'created_at' (datetime object), 'sections' (list of section objects), and a 'get_all_references()' method. Each section object should have 'type' (string: 'header' or 'text'), 'title' (string), 'content' (string, may contain HTML or Markdown), 'level' (integer for headers), 'id' (identifier), and 'references' (list of reference dictionaries with 'title' and 'source' keys).

Return Value

Returns bytes representing the complete DOCX file content. The bytes can be written to a file, sent as an HTTP response, or stored in memory. Returns None implicitly if an ImportError is raised when python-docx is not available.

Dependencies

python-docx
io

Required Imports

from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from io import BytesIO

Conditional/Optional Imports

These imports are only needed under specific conditions:

from helper functions: html_to_markdown, process_markdown_content, add_formatted_content_to_word, clean_html_tags

Condition: These functions must be defined in the same module or imported separately. They handle content conversion from HTML/Markdown to Word format.

Required (conditional)

Usage Example

from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from io import BytesIO
from datetime import datetime

# Assuming DOCX_AVAILABLE is True and helper functions are defined
DOCX_AVAILABLE = True

class MockSection:
    def __init__(self, type, title, content='', level=1, references=None):
        self.type = type
        self.title = title
        self.content = content
        self.level = level
        self.id = 'section-1'
        self.references = references or []

class MockDocument:
    def __init__(self):
        self.title = 'Sample Report'
        self.author = 'John Doe'
        self.created_at = datetime.now()
        self.sections = [
            MockSection('header', 'Introduction', level=1),
            MockSection('text', 'Overview', 'This is the overview content.', references=[{'title': 'Reference 1', 'source': 'Source A'}])
        ]
    
    def get_all_references(self):
        return [{'title': 'Reference 1', 'source': 'Source A'}]

doc = MockDocument()
docx_bytes = export_to_docx(doc)

# Save to file
with open('output.docx', 'wb') as f:
    f.write(docx_bytes)

# Or send as HTTP response in Flask
# return send_file(BytesIO(docx_bytes), mimetype='application/vnd.openxmlformats-officedocument.wordprocessingml.document', as_attachment=True, download_name='document.docx')

Best Practices

Ensure the DOCX_AVAILABLE flag is properly set before calling this function to avoid ImportError
The document object must implement all required attributes and methods (title, author, created_at, sections, get_all_references())
Helper functions (html_to_markdown, process_markdown_content, add_formatted_content_to_word, clean_html_tags) must be defined or imported
Handle the returned bytes appropriately - either write to file or stream to HTTP response
Word supports up to 9 heading levels; the function automatically caps section levels at 9
The function includes error handling for content processing but falls back to simple text if formatting fails
Consider memory usage when processing large documents as the entire DOCX is built in memory
Ensure datetime objects in the document have proper timezone information if needed
The function deduplicates references in the bibliography section based on title and source combination

Similar Components

AI-powered semantic similarity - components with related functionality:

function export_to_docx 90.0% similar

Exports a document with text and data sections to Microsoft Word DOCX format, preserving formatting, structure, and metadata.
From: /tf/active/vicechatdev/vice_ai/new_app.py
function export_to_pdf_v1 70.8% similar

Converts a document object with sections and references into a formatted PDF file using ReportLab, supporting multiple heading levels, text content with markdown/HTML processing, and reference management.
From: /tf/active/vicechatdev/vice_ai/complex_app.py
function add_formatted_content_to_word 67.2% similar

Converts processed markdown elements into formatted content within a Word document, handling headers, paragraphs, lists, tables, and code blocks with appropriate styling.
From: /tf/active/vicechatdev/vice_ai/new_app.py
function create_enhanced_word_document_v1 67.1% similar

Converts markdown content into a formatted Microsoft Word document with proper styling, table of contents, warranty sections, and reference handling for Project Victoria warranty disclosures.
From: /tf/active/vicechatdev/enhanced_word_converter_fixed.py
function add_formatted_content_to_word_v1 66.9% similar

Converts processed markdown elements into formatted content within a Microsoft Word document, handling headers, paragraphs, lists, tables, and code blocks with appropriate styling.
From: /tf/active/vicechatdev/vice_ai/complex_app.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def export_to_docx(document):
    """Export document to DOCX format"""
    if not DOCX_AVAILABLE:
        raise ImportError("python-docx not available")
    
    doc = Document()
    
    # Set document title
    title = doc.add_heading(document.title, 0)
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    # Add author and metadata
    if document.author:
        author_para = doc.add_paragraph()
        author_para.add_run(f"Author: {document.author}").bold = True
        author_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    date_para = doc.add_paragraph()
    date_para.add_run(f"Created: {document.created_at.strftime('%Y-%m-%d %H:%M')}").italic = True
    date_para.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    doc.add_paragraph()  # Empty line
    
    # Add sections
    for section in document.sections:
        if section.type == 'header':
            # Add header
            level = min(section.level, 9)  # Word supports up to 9 heading levels
            heading = doc.add_heading(section.title, level)
            
        elif section.type == 'text':
            # Add text content
            if section.title:
                doc.add_heading(section.title, 3)
            
            if section.content:
                # Check if content is HTML or Markdown and process accordingly
                content_to_process = section.content
                
                # If content looks like HTML, convert to Markdown first
                if '<' in content_to_process and '>' in content_to_process:
                    # Content appears to be HTML, convert to Markdown
                    content_to_process = html_to_markdown(content_to_process)
                
                # Process markdown content for proper formatting
                try:
                    elements = process_markdown_content(content_to_process)
                    add_formatted_content_to_word(doc, elements)
                except Exception as e:
                    logger.warning(f"Error processing content for section {section.id}: {e}")
                    # Fallback to simple paragraph splitting
                    # Clean HTML tags if present for fallback
                    clean_content = clean_html_tags(content_to_process)
                    paragraphs = clean_content.split('\n\n')
                    for para_text in paragraphs:
                        if para_text.strip():
                            para = doc.add_paragraph()
                            para.add_run(para_text.strip())
            
            # Add section references if any
            if section.references:
                ref_heading = doc.add_heading("References for this section:", 4)
                for i, ref in enumerate(section.references, 1):
                    ref_para = doc.add_paragraph()
                    ref_para.add_run(f"[{i}] ").bold = True
                    ref_para.add_run(ref.get('title', 'Untitled Reference'))
                
                doc.add_paragraph()  # Empty line after references
    
    # Add global references
    all_references = document.get_all_references()
    if all_references:
        doc.add_page_break()
        doc.add_heading("References", 1)
        
        unique_refs = {}
        ref_counter = 1
        
        for ref in all_references:
            ref_key = ref.get('title', 'Untitled') + ref.get('source', '')
            if ref_key not in unique_refs:
                unique_refs[ref_key] = ref_counter
                ref_para = doc.add_paragraph()
                ref_para.add_run(f"[{ref_counter}] ").bold = True
                ref_para.add_run(ref.get('title', 'Untitled Reference'))
                if ref.get('source'):
                    ref_para.add_run(f" ({ref['source']})")
                ref_counter += 1
    
    # Save to BytesIO
    buffer = BytesIO()
    doc.save(buffer)
    buffer.seek(0)
    return buffer.getvalue()
                        

Improved Code

🔍 Code Extractor

function export_to_docx_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function export_to_docx 90.0% similar

function export_to_pdf_v1 70.8% similar

function add_formatted_content_to_word 67.2% similar

function create_enhanced_word_document_v1 67.1% similar

function add_formatted_content_to_word_v1 66.9% similar

function export_to_docx_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function export_to_docx 90.0% similar

function export_to_pdf_v1 70.8% similar

function add_formatted_content_to_word 67.2% similar

function create_enhanced_word_document_v1 67.1% similar

function add_formatted_content_to_word_v1 66.9% similar

✨ Improve Code: export_to_docx_v1

Code Comparison