🔍 Code Extractor

function simple_markdown_to_html

Maturity: 48

Converts a subset of Markdown syntax to clean HTML, supporting headers, bold text, unordered lists, and paragraphs.

File:
/tf/active/vicechatdev/vice_ai/new_app.py
Lines:
2542 - 2605
Complexity:
moderate

Purpose

This function provides a lightweight Markdown-to-HTML converter specifically designed for displaying formatted text in data sections of a document management system. It handles common formatting elements (headers up to h6, bold text with **, unordered lists with - or *, and paragraphs) while intentionally skipping image markdown syntax. The function maintains proper HTML structure by managing list opening/closing tags and converting empty lines to line breaks.

Source Code

def simple_markdown_to_html(markdown_text):
    """
    Convert markdown to clean HTML for display in data sections
    Handles: headers (#, ##, ###), bold (**text**), lists, and paragraphs
    """
    if not markdown_text:
        return ""
    
    lines = markdown_text.split('\n')
    html_lines = []
    in_list = False
    
    for line in lines:
        stripped = line.strip()
        
        if not stripped:
            if in_list:
                html_lines.append('</ul>')
                in_list = False
            html_lines.append('<br/>')
            continue
        
        # Headers - check for ## pattern at start
        if stripped.startswith('#'):
            if in_list:
                html_lines.append('</ul>')
                in_list = False
            # Count consecutive # at start
            level = 0
            for char in stripped:
                if char == '#':
                    level += 1
                else:
                    break
            # Cap at h6
            level = min(level, 6)
            text = stripped[level:].strip()
            # Handle bold within headers
            text = re.sub(r'\*\*([^*]+)\*\*', r'<strong>\1</strong>', text)
            html_lines.append(f'<h{level}>{text}</h{level}>')
        # Lists
        elif stripped.startswith('- ') or stripped.startswith('* '):
            if not in_list:
                html_lines.append('<ul>')
                in_list = True
            text = stripped[2:].strip()
            # Handle bold in lists
            text = re.sub(r'\*\*([^*]+)\*\*', r'<strong>\1</strong>', text)
            html_lines.append(f'<li>{text}</li>')
        # Regular paragraphs
        else:
            if in_list:
                html_lines.append('</ul>')
                in_list = False
            # Handle bold text
            text = re.sub(r'\*\*([^*]+)\*\*', r'<strong>\1</strong>', stripped)
            # Skip image markdown for now (plots are handled separately)
            if not text.startswith('!['):
                html_lines.append(f'<p>{text}</p>')
    
    if in_list:
        html_lines.append('</ul>')
    
    return '\n'.join(html_lines)

Parameters

Name Type Default Kind
markdown_text - - positional_or_keyword

Parameter Details

markdown_text: A string containing Markdown-formatted text. Can be None or empty string. Supports headers (# through ######), bold text (**text**), unordered lists (- or * prefix), and regular paragraphs. Image markdown (![...]) is intentionally ignored.

Return Value

Returns a string containing HTML markup. If input is None or empty, returns an empty string. Otherwise returns newline-separated HTML elements including <h1>-<h6> for headers, <strong> for bold text, <ul>/<li> for lists, <p> for paragraphs, and <br/> for empty lines. All HTML is properly nested with lists closed before starting new block elements.

Dependencies

  • re

Required Imports

import re

Usage Example

import re

def simple_markdown_to_html(markdown_text):
    if not markdown_text:
        return ""
    lines = markdown_text.split('\n')
    html_lines = []
    in_list = False
    for line in lines:
        stripped = line.strip()
        if not stripped:
            if in_list:
                html_lines.append('</ul>')
                in_list = False
            html_lines.append('<br/>')
            continue
        if stripped.startswith('#'):
            if in_list:
                html_lines.append('</ul>')
                in_list = False
            level = 0
            for char in stripped:
                if char == '#':
                    level += 1
                else:
                    break
            level = min(level, 6)
            text = stripped[level:].strip()
            text = re.sub(r'\*\*([^*]+)\*\*', r'<strong>\1</strong>', text)
            html_lines.append(f'<h{level}>{text}</h{level}>')
        elif stripped.startswith('- ') or stripped.startswith('* '):
            if not in_list:
                html_lines.append('<ul>')
                in_list = True
            text = stripped[2:].strip()
            text = re.sub(r'\*\*([^*]+)\*\*', r'<strong>\1</strong>', text)
            html_lines.append(f'<li>{text}</li>')
        else:
            if in_list:
                html_lines.append('</ul>')
                in_list = False
            text = re.sub(r'\*\*([^*]+)\*\*', r'<strong>\1</strong>', stripped)
            if not text.startswith('!['):
                html_lines.append(f'<p>{text}</p>')
    if in_list:
        html_lines.append('</ul>')
    return '\n'.join(html_lines)

# Example usage
markdown = """# Main Title
## Subtitle
This is a **bold** statement.

- First item
- Second **bold** item
- Third item

Regular paragraph text."""

html_output = simple_markdown_to_html(markdown)
print(html_output)

Best Practices

  • Input validation: The function safely handles None and empty string inputs by returning an empty string
  • The function does not escape HTML entities in the input text, so ensure markdown_text is from a trusted source or pre-sanitize it to prevent XSS vulnerabilities
  • Image markdown syntax (![...]) is intentionally skipped as images are handled separately in the application context
  • The function caps header levels at h6 (HTML standard maximum) even if more # symbols are provided
  • List state is properly managed to ensure closing </ul> tags are added when transitioning between block types
  • Bold text pattern (**text**) only matches non-greedy patterns and won't work correctly with nested asterisks
  • The function preserves newlines in output for readability but they don't affect HTML rendering
  • Consider using a full-featured Markdown library like markdown2 or mistune for production use with more complex Markdown syntax

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function basic_markdown_to_html 88.1% similar

    Converts basic Markdown syntax to HTML without using external Markdown libraries, handling headers, lists, code blocks, and inline formatting.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function html_to_markdown_v1 87.1% similar

    Converts HTML markup to Markdown syntax, handling headers, code blocks, text formatting, links, lists, and paragraphs with proper spacing.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function html_to_markdown 83.4% similar

    Converts HTML text back to Markdown format using regex-based pattern matching and replacement, handling headers, code blocks, formatting, links, lists, and HTML entities.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function convert_markdown_to_html_v1 78.3% similar

    Converts basic Markdown syntax to HTML markup compatible with ReportLab PDF generation, including support for clickable links, bold, italic, and inline code formatting.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function convert_markdown_to_html 77.6% similar

    Converts basic markdown formatting (bold, italic, code) to HTML markup suitable for PDF generation using ReportLab.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse