🔍 Code Extractor

function test_markdown_link_parsing

Maturity: 42

A test function that validates markdown link parsing capabilities, specifically testing extraction and URL encoding of complex URLs containing special characters from Quill editor format.

File:
/tf/active/vicechatdev/test_complex_hyperlink.py
Lines:
50 - 80
Complexity:
simple

Purpose

This function serves as a unit test to verify that markdown links with complex URLs (containing special characters like &, commas, spaces, and URL fragments) can be correctly parsed, extracted, and encoded. It demonstrates the process of splitting markdown link syntax, extracting link text and URLs, and properly encoding URL paths while preserving query parameters and fragments.

Source Code

def test_markdown_link_parsing():
    """Test markdown link parsing with complex URLs"""
    print("\nTesting markdown link parsing...")
    
    # Test the exact format that would come from Quill editor
    markdown_text = "[3.5.1 Cost model for WBPK022&K024,K034_20240624.xlsx](https://filecloud.vicebio.com/ui/core/index.html?filter=3.5.1+Cost+model+for+WBPK022&K024,K034_20240624.xlsx#expl-tabl./SHARED/vicebio_shares/Wuxi/3%20WO-CO%20&%20invoice%20plan/3.5%20Cost%20Model/)"
    
    print(f"Input markdown: {markdown_text}")
    
    import re
    # Test URL extraction
    link_parts = re.split(r'\[([^\]]+)\]\(([^)]+)\)', markdown_text)
    print(f"Parsed parts: {link_parts}")
    
    if len(link_parts) >= 3:
        text = link_parts[1]
        url = link_parts[2] 
        print(f"Extracted text: '{text}'")
        print(f"Extracted URL: '{url}'")
        
        # Test URL encoding
        import urllib.parse
        if '://' in url:
            scheme_and_domain, path_part = url.split('://', 1)
            if '/' in path_part:
                domain, path = path_part.split('/', 1)
                encoded_path = urllib.parse.quote(path, safe='/?&=:#%')
                clean_url = f"{scheme_and_domain}://{domain}/{encoded_path}"
                print(f"Cleaned URL: '{clean_url}'")
    
    print("✅ URL parsing test completed")

Return Value

This function does not return any value (implicitly returns None). It prints test results and status messages to stdout, including the input markdown, parsed parts, extracted text and URL, and the cleaned/encoded URL.

Dependencies

  • re
  • urllib.parse

Required Imports

import re
import urllib.parse

Usage Example

import re
import urllib.parse

def test_markdown_link_parsing():
    """Test markdown link parsing with complex URLs"""
    print("\nTesting markdown link parsing...")
    
    markdown_text = "[3.5.1 Cost model for WBPK022&K024,K034_20240624.xlsx](https://filecloud.vicebio.com/ui/core/index.html?filter=3.5.1+Cost+model+for+WBPK022&K024,K034_20240624.xlsx#expl-tabl./SHARED/vicebio_shares/Wuxi/3%20WO-CO%20&%20invoice%20plan/3.5%20Cost%20Model/)"
    
    print(f"Input markdown: {markdown_text}")
    
    link_parts = re.split(r'\[([^\]]+)\]\(([^)]+)\)', markdown_text)
    print(f"Parsed parts: {link_parts}")
    
    if len(link_parts) >= 3:
        text = link_parts[1]
        url = link_parts[2] 
        print(f"Extracted text: '{text}'")
        print(f"Extracted URL: '{url}'")
        
        if '://' in url:
            scheme_and_domain, path_part = url.split('://', 1)
            if '/' in path_part:
                domain, path = path_part.split('/', 1)
                encoded_path = urllib.parse.quote(path, safe='/?&=:#%')
                clean_url = f"{scheme_and_domain}://{domain}/{encoded_path}"
                print(f"Cleaned URL: '{clean_url}'")
    
    print("✅ URL parsing test completed")

# Run the test
test_markdown_link_parsing()

Best Practices

  • This is a test function meant for validation purposes, not production use
  • The regex pattern r'\[([^\]]+)\]\(([^)]+)\)' assumes well-formed markdown links and may not handle nested brackets or escaped characters
  • The URL encoding preserves specific safe characters ('/?&=:#%') which may need adjustment based on specific URL requirements
  • The function assumes URLs contain '://' scheme separator and at least one path component
  • For production code, consider using a dedicated markdown parsing library instead of regex
  • The function prints directly to stdout; consider using logging or returning results for better testability

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_markdown_processing 73.0% similar

    A test function that validates markdown processing capabilities by testing content parsing, element extraction, and HTML conversion functionality.

    From: /tf/active/vicechatdev/vice_ai/test_markdown.py
  • function test_complex_url_hyperlink 64.7% similar

    A test function that validates the creation of Word documents with complex FileCloud URLs containing special characters, query parameters, and URL fragments as clickable hyperlinks.

    From: /tf/active/vicechatdev/test_complex_hyperlink.py
  • function convert_markdown_to_html_v1 60.0% similar

    Converts basic Markdown syntax to HTML markup compatible with ReportLab PDF generation, including support for clickable links, bold, italic, and inline code formatting.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function format_inline_markdown 58.6% similar

    Converts inline Markdown syntax (bold, italic, code, links) to HTML tags while escaping HTML entities for safe rendering.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function html_to_markdown_v1 57.0% similar

    Converts HTML markup to Markdown syntax, handling headers, code blocks, text formatting, links, lists, and paragraphs with proper spacing.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
← Back to Browse