🔍 Code Extractor

function basic_markdown_to_html

Maturity: 47

Converts basic Markdown syntax to HTML without using external Markdown libraries, handling headers, lists, code blocks, and inline formatting.

File:
/tf/active/vicechatdev/vice_ai/complex_app.py
Lines:
1821 - 1914
Complexity:
moderate

Purpose

This function provides a lightweight Markdown-to-HTML converter for applications that need basic Markdown rendering without adding external dependencies. It supports common Markdown features including headers (h1-h3), ordered and unordered lists, code blocks (fenced with ), and inline formatting. The function processes text line-by-line, maintaining state for multi-line structures like lists and code blocks, and properly escapes HTML content within code blocks.

Source Code

def basic_markdown_to_html(text):
    """Basic Markdown to HTML conversion without external libraries"""
    if not text:
        return ""
    
    # Split into lines for processing
    lines = text.split('\n')
    result_lines = []
    in_list = False
    in_code_block = False
    list_type = None  # 'ul' or 'ol'
    
    i = 0
    while i < len(lines):
        line = lines[i]
        stripped = line.strip()
        
        # Handle code blocks
        if stripped.startswith('```'):
            # Close any open list before starting code block
            if in_list:
                result_lines.append(f'</{list_type}>')
                in_list = False
                list_type = None
            
            if not in_code_block:
                # Start code block
                in_code_block = True
                result_lines.append('<pre><code>')
            else:
                # End code block
                in_code_block = False
                result_lines.append('</code></pre>')
            i += 1
            continue
        
        # If we're in a code block, just add the line as-is (escaped)
        if in_code_block:
            result_lines.append(html.escape(line))
            i += 1
            continue
        
        # Close any open list if this line doesn't continue it
        if in_list and not (stripped.startswith('- ') or stripped.startswith('* ') or re.match(r'^\d+\. ', stripped)) and stripped:
            result_lines.append(f'</{list_type}>')
            in_list = False
            list_type = None
        
        # Headers (process before other formatting)
        if stripped.startswith('### '):
            result_lines.append(f'<h3>{stripped[4:]}</h3>')
        elif stripped.startswith('## '):
            result_lines.append(f'<h2>{stripped[3:]}</h2>')
        elif stripped.startswith('# '):
            result_lines.append(f'<h1>{stripped[2:]}</h1>')
        # Unordered list
        elif stripped.startswith('- ') or stripped.startswith('* '):
            if not in_list or list_type != 'ul':
                if in_list:
                    result_lines.append(f'</{list_type}>')
                result_lines.append('<ul>')
                in_list = True
                list_type = 'ul'
            content = format_inline_markdown(stripped[2:])
            result_lines.append(f'<li>{content}</li>')
        # Ordered list
        elif re.match(r'^\d+\. ', stripped):
            if not in_list or list_type != 'ol':
                if in_list:
                    result_lines.append(f'</{list_type}>')
                result_lines.append('<ol>')
                in_list = True
                list_type = 'ol'
            content = re.sub(r'^\d+\. ', '', stripped)
            content = format_inline_markdown(content)
            result_lines.append(f'<li>{content}</li>')
        # Empty line
        elif not stripped:
            if not in_list:
                result_lines.append('')
        # Regular paragraph
        else:
            content = format_inline_markdown(stripped)
            result_lines.append(f'<p>{content}</p>')
        
        i += 1
    
    # Close any open structures
    if in_code_block:
        result_lines.append('</code></pre>')
    if in_list:
        result_lines.append(f'</{list_type}>')
    
    return '\n'.join(result_lines)

Parameters

Name Type Default Kind
text - - positional_or_keyword

Parameter Details

text: A string containing Markdown-formatted text to be converted to HTML. Can be None or empty string, which will return an empty string. Supports headers (#, ##, ###), unordered lists (-, *), ordered lists (1., 2., etc.), code blocks (), and inline formatting (processed by format_inline_markdown helper function).

Return Value

Returns a string containing the HTML representation of the input Markdown text. The HTML includes semantic tags like <h1>-<h3>, <ul>, <ol>, <li>, <p>, <pre>, and <code>. Lines are joined with newline characters. Returns an empty string if input is None or empty. Code block content is HTML-escaped for safety.

Dependencies

  • html
  • re

Required Imports

import html
import re

Usage Example

import html
import re

# Note: You must define format_inline_markdown function first
def format_inline_markdown(text):
    # Simple implementation for example
    text = re.sub(r'\*\*(.+?)\*\*', r'<strong>\1</strong>', text)
    text = re.sub(r'\*(.+?)\*', r'<em>\1</em>', text)
    return text

markdown_text = '''# Main Title
## Subtitle
This is a paragraph.

- Item 1
- Item 2

1. First
2. Second


code example

'''

html_output = basic_markdown_to_html(markdown_text)
print(html_output)

Best Practices

  • Ensure the format_inline_markdown helper function is defined before calling this function, as it's a required dependency
  • Input text should use standard Markdown syntax; non-standard syntax may not be processed correctly
  • The function handles nested structures by closing open lists before starting code blocks, but does not support nested lists
  • Code blocks are HTML-escaped automatically for security, preventing XSS attacks
  • Empty lines within lists will close the list; ensure list items are consecutive if you want them in the same list
  • Only supports headers up to level 3 (###); deeper headers will be treated as regular paragraphs
  • The function processes text line-by-line with state management, so very large texts are handled efficiently
  • Ordered list numbering in the output HTML is automatic and doesn't preserve the original Markdown numbers

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function simple_markdown_to_html 88.1% similar

    Converts a subset of Markdown syntax to clean HTML, supporting headers, bold text, unordered lists, and paragraphs.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function html_to_markdown_v1 85.0% similar

    Converts HTML markup to Markdown syntax, handling headers, code blocks, text formatting, links, lists, and paragraphs with proper spacing.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function html_to_markdown 83.2% similar

    Converts HTML text back to Markdown format using regex-based pattern matching and replacement, handling headers, code blocks, formatting, links, lists, and HTML entities.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function markdown_to_html 78.4% similar

    Converts Markdown formatted text to HTML using the python-markdown library with multiple extensions, falling back to basic conversion if the library is unavailable.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function format_inline_markdown 75.2% similar

    Converts inline Markdown syntax (bold, italic, code, links) to HTML tags while escaping HTML entities for safe rendering.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse