🔍 Code Extractor

function sanitize_filename

Maturity: 51

Sanitizes a filename string by replacing invalid filesystem characters with underscores and ensuring a valid output.

File:
/tf/active/vicechatdev/CDocs/utils/__init__.py
Lines:
87 - 104
Complexity:
simple

Purpose

This function ensures filenames are safe for use across different operating systems by removing or replacing characters that are invalid in Windows, Linux, or macOS filesystems. It handles edge cases like empty strings and whitespace, making it suitable for user-generated filenames, downloaded files, or dynamically created file paths. Common use cases include processing uploaded files, generating report filenames, or creating safe document names in content management systems.

Source Code

def sanitize_filename(filename: str) -> str:
    """
    Sanitize filename to remove invalid characters.
    
    Args:
        filename: Original filename
        
    Returns:
        Sanitized filename
    """
    # Replace characters that are invalid in filenames
    sanitized = re.sub(r'[\\/*?:"<>|]', "_", filename)
    # Remove leading/trailing whitespace
    sanitized = sanitized.strip()
    # Ensure we have a valid filename
    if not sanitized:
        sanitized = "document"
    return sanitized

Parameters

Name Type Default Kind
filename str - positional_or_keyword

Parameter Details

filename: The original filename string that needs to be sanitized. Can contain any characters including invalid filesystem characters like slashes, colons, asterisks, quotes, angle brackets, pipes, and backslashes. May also contain leading/trailing whitespace or be an empty string.

Return Value

Type: str

Returns a sanitized string safe for use as a filename. Invalid characters (\/*?:"<>|) are replaced with underscores. Leading and trailing whitespace is removed. If the resulting string is empty after sanitization, returns the default string 'document'. The return value is guaranteed to be a non-empty string suitable for filesystem operations.

Dependencies

  • re

Required Imports

import re

Usage Example

import re

def sanitize_filename(filename: str) -> str:
    sanitized = re.sub(r'[\\/*?:"<>|]', "_", filename)
    sanitized = sanitized.strip()
    if not sanitized:
        sanitized = "document"
    return sanitized

# Example usage
original = "my/file:name*.txt"
safe_name = sanitize_filename(original)
print(safe_name)  # Output: my_file_name_.txt

# Handle empty or whitespace-only strings
empty_result = sanitize_filename("   ")
print(empty_result)  # Output: document

# Handle special characters
special = "report<2024>|final?.pdf"
safe_special = sanitize_filename(special)
print(safe_special)  # Output: report_2024__final_.pdf

Best Practices

  • Always sanitize user-provided filenames before saving to filesystem to prevent path traversal attacks
  • Be aware that this function only handles character replacement and does not validate filename length limits (typically 255 characters on most systems)
  • Consider additional validation for reserved filenames on Windows (CON, PRN, AUX, NUL, COM1-9, LPT1-9)
  • The function preserves file extensions, but does not validate them - consider separate extension validation if needed
  • For internationalization, be aware that this function preserves Unicode characters which may not be supported on all filesystems
  • Consider combining with path validation to ensure the sanitized filename is used only as a basename, not a full path

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function sanitize_folders 64.5% similar

    Recursively traverses a directory tree and sanitizes folder names by removing non-ASCII characters, renaming folders to ASCII-only versions.

    From: /tf/active/vicechatdev/creation_updater.py
  • function get_file_extension 54.1% similar

    Extracts and returns the file extension from a given filename string, normalized to lowercase without the leading dot.

    From: /tf/active/vicechatdev/CDocs/utils/__init__.py
  • function capitalize_unicode_name 52.2% similar

    Transforms Unicode character name strings by removing the word 'capital' and capitalizing the following word, converting strings like 'capital delta' to 'Delta'.

    From: /tf/active/vicechatdev/patches/util.py
  • function is_valid_document_file 51.3% similar

    Validates whether a given filename has an extension corresponding to a supported document type by checking against a predefined list of valid document extensions.

    From: /tf/active/vicechatdev/CDocs/utils/__init__.py
  • function clean_html_tags 50.5% similar

    Removes HTML tags and entities from text strings, returning clean plain text suitable for PDF display or other formatted output.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse