🔍 Code Extractor

function add_formatted_content_to_word

Maturity: 47

Converts processed markdown elements into formatted content within a Word document, handling headers, paragraphs, lists, tables, and code blocks with appropriate styling.

File:
/tf/active/vicechatdev/vice_ai/new_app.py
Lines:
3924 - 3951
Complexity:
moderate

Purpose

This function serves as a markdown-to-Word converter that iterates through a list of parsed markdown elements and adds them to a python-docx Document object with proper formatting. It supports multiple element types including headers (up to 6 levels), paragraphs with inline formatting, bulleted and numbered lists, tables, and code blocks. The function delegates inline formatting to a helper function and uses Word's built-in styles for consistent document appearance.

Source Code

def add_formatted_content_to_word(doc, elements):
    """Add processed markdown elements to Word document with proper formatting"""
    for element in elements:
        if element['type'] == 'header':
            level = min(element['level'], 6)  # Word supports up to 6 heading levels
            heading = doc.add_heading(element['content'], level)
            
        elif element['type'] == 'paragraph':
            para = doc.add_paragraph()
            add_inline_formatting_to_paragraph(para, element['content'])
            
        elif element['type'] in ['list_item', 'bullet']:
            para = doc.add_paragraph(style='List Bullet')
            add_inline_formatting_to_paragraph(para, element['content'])
            
        elif element['type'] in ['numbered_list_item', 'numbered']:
            para = doc.add_paragraph(style='List Number')
            add_inline_formatting_to_paragraph(para, element['content'])
            
        elif element['type'] == 'table':
            logger.info(f"Adding table element with content: {element['content']}")
            add_table_to_word(doc, element['content'])
            
        elif element['type'] == 'code_block_start':
            # Handle code blocks (simplified for now)
            para = doc.add_paragraph(style='Normal')
            run = para.add_run(element['content'])
            run.font.name = 'Courier New'

Parameters

Name Type Default Kind
doc - - positional_or_keyword
elements - - positional_or_keyword

Parameter Details

doc: A python-docx Document object representing the Word document to which formatted content will be added. This should be an instance of docx.Document that has been initialized before calling this function.

elements: A list of dictionaries where each dictionary represents a parsed markdown element. Each element must have a 'type' key (values: 'header', 'paragraph', 'list_item', 'bullet', 'numbered_list_item', 'numbered', 'table', 'code_block_start') and a 'content' key containing the text/data. Header elements should also include a 'level' key (integer 1-6) indicating the heading level.

Return Value

This function does not return any value (implicitly returns None). It modifies the 'doc' parameter in-place by adding formatted content to the Word document object.

Dependencies

  • python-docx
  • logging

Required Imports

from docx import Document
import logging

Usage Example

from docx import Document
import logging

logger = logging.getLogger(__name__)

# Define helper functions (simplified examples)
def add_inline_formatting_to_paragraph(para, content):
    para.add_run(content)

def add_table_to_word(doc, table_data):
    # Add table logic here
    pass

# Create a Word document
doc = Document()

# Define markdown elements
elements = [
    {'type': 'header', 'level': 1, 'content': 'Main Title'},
    {'type': 'paragraph', 'content': 'This is a paragraph with text.'},
    {'type': 'bullet', 'content': 'First bullet point'},
    {'type': 'numbered', 'content': 'First numbered item'},
    {'type': 'code_block_start', 'content': 'print("Hello World")'},
    {'type': 'table', 'content': [["Header1", "Header2"], ["Data1", "Data2"]]}
]

# Add formatted content to document
add_formatted_content_to_word(doc, elements)

# Save the document
doc.save('output.docx')

Best Practices

  • Ensure the 'doc' parameter is a valid python-docx Document object before calling this function
  • Validate that all elements in the 'elements' list have the required 'type' and 'content' keys to avoid KeyError exceptions
  • Header levels are automatically capped at 6 to comply with Word's heading level limitations
  • The function depends on two helper functions (add_inline_formatting_to_paragraph and add_table_to_word) that must be implemented in the same module
  • Consider adding error handling for malformed elements to make the function more robust
  • The logger instance must be configured at module level before using this function
  • For code blocks, the function uses 'Courier New' font but does not apply background color or other typical code formatting - consider enhancing this for better code presentation
  • The function modifies the document in-place, so ensure you have a reference to the document object for saving after processing

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function add_formatted_content_to_word_v1 95.0% similar

    Converts processed markdown elements into formatted content within a Microsoft Word document, handling headers, paragraphs, lists, tables, and code blocks with appropriate styling.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function add_inline_formatting_to_paragraph 78.4% similar

    Parses markdown-formatted text and applies inline formatting (bold, italic, code) to a Microsoft Word paragraph object using the python-docx library.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function add_inline_formatting_to_paragraph_v1 76.3% similar

    Parses markdown-formatted text and adds it to a Word document paragraph, converting markdown links [text](url) into clickable hyperlinks while delegating other markdown formatting to a helper function.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function add_markdown_formatting_to_paragraph 73.9% similar

    Parses markdown-formatted text and applies corresponding formatting (bold, italic, code) to runs within a python-docx paragraph object.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function add_formatted_content_to_pdf 73.1% similar

    Processes markdown elements and adds them to a PDF document story with appropriate formatting, handling headers, paragraphs, lists, and tables.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse