clean_html_tags_v1 - Code Extractor

function clean_html_tags_v1

Maturity: 34

Removes all HTML tags from a given text string using regular expression pattern matching, returning clean text without markup.

File:
/tf/active/vicechatdev/vice_ai/new_app.py

Lines:
3676 - 3680

Complexity:
simple

Purpose

This utility function sanitizes text by stripping out HTML tags, making it useful for cleaning user input, preparing text for display in non-HTML contexts, extracting plain text from HTML content, or sanitizing data before storage. It handles edge cases like None or empty strings by returning an empty string.

Source Code

def clean_html_tags(text):
    """Remove HTML tags from text"""
    if not text:
        return ""
    return re.sub(r'<[^>]+>', '', text)

Parameters

Name	Type	Default	Kind
`text`	-	-	positional_or_keyword

Parameter Details

text: The input string that may contain HTML tags. Can be None, an empty string, or any string with or without HTML markup. If None or empty, the function returns an empty string without attempting regex processing.

Return Value

Returns a string with all HTML tags removed. If the input is None or empty, returns an empty string (''). The function preserves all text content between tags but removes the tags themselves (anything matching the pattern '<...>'). Return type is always str.

Dependencies

re

Required Imports

import re

Usage Example

import re

def clean_html_tags(text):
    """Remove HTML tags from text"""
    if not text:
        return ""
    return re.sub(r'<[^>]+>', '', text)

# Example usage
html_text = "<p>Hello <strong>world</strong>!</p>"
clean_text = clean_html_tags(html_text)
print(clean_text)  # Output: "Hello world!"

# Handle None input
result = clean_html_tags(None)
print(result)  # Output: ""

# Handle empty string
result = clean_html_tags("")
print(result)  # Output: ""

# Complex HTML
complex_html = "<div class='container'><h1>Title</h1><p>Paragraph with <a href='#'>link</a></p></div>"
clean = clean_html_tags(complex_html)
print(clean)  # Output: "TitleParagraph with link"

Best Practices

This function uses a simple regex pattern that may not handle all edge cases of malformed HTML or HTML entities
For more robust HTML parsing and cleaning, consider using libraries like BeautifulSoup or html.parser
The function does not decode HTML entities (e.g., '&' remains as '&'), use html.unescape() if entity decoding is needed
The regex pattern '<[^>]+>' removes tags but does not add spaces, so adjacent tags may result in concatenated words
Always validate and sanitize user input even after tag removal to prevent other security issues
Consider using this in combination with other sanitization methods for comprehensive text cleaning
The function is safe for None inputs but does not validate if the input is actually a string type

Similar Components

AI-powered semantic similarity - components with related functionality:

function clean_html_tags 79.9% similar

Removes HTML tags and entities from text strings, returning clean plain text suitable for PDF display or other formatted output.
From: /tf/active/vicechatdev/vice_ai/complex_app.py
function clean_text 74.7% similar

Cleans and normalizes text content by removing HTML tags, normalizing whitespace, and stripping markdown formatting elements.
From: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py
function clean_text_for_xml_v1 63.9% similar

Sanitizes text strings to ensure XML 1.0 compatibility by removing or replacing invalid control characters and ensuring all characters meet XML specification requirements for Word document generation.
From: /tf/active/vicechatdev/enhanced_word_converter_fixed.py
function clean_text_for_xml 60.3% similar

Sanitizes text by removing or replacing XML-incompatible characters to ensure compatibility with Word document XML structure.
From: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py
function html_to_text 58.7% similar

Converts HTML content to plain text by removing HTML tags, decoding common HTML entities, and normalizing whitespace.
From: /tf/active/vicechatdev/CDocs/utils/notifications.py

🔍 Code Extractor

function clean_html_tags_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function clean_html_tags 79.9% similar

function clean_text 74.7% similar

function clean_text_for_xml_v1 63.9% similar

function clean_text_for_xml 60.3% similar

function html_to_text 58.7% similar

function clean_html_tags_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function clean_html_tags 79.9% similar

function clean_text 74.7% similar

function clean_text_for_xml_v1 63.9% similar

function clean_text_for_xml 60.3% similar

function html_to_text 58.7% similar

✨ Improve Code: clean_html_tags_v1

Code Comparison