function sanitize_filename
Sanitizes a filename string by replacing invalid filesystem characters with underscores and ensuring a valid output.
/tf/active/vicechatdev/CDocs/utils/__init__.py
87 - 104
simple
Purpose
This function ensures filenames are safe for use across different operating systems by removing or replacing characters that are invalid in Windows, Linux, or macOS filesystems. It handles edge cases like empty strings and whitespace, making it suitable for user-generated filenames, downloaded files, or dynamically created file paths. Common use cases include processing uploaded files, generating report filenames, or creating safe document names in content management systems.
Source Code
def sanitize_filename(filename: str) -> str:
"""
Sanitize filename to remove invalid characters.
Args:
filename: Original filename
Returns:
Sanitized filename
"""
# Replace characters that are invalid in filenames
sanitized = re.sub(r'[\\/*?:"<>|]', "_", filename)
# Remove leading/trailing whitespace
sanitized = sanitized.strip()
# Ensure we have a valid filename
if not sanitized:
sanitized = "document"
return sanitized
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
filename |
str | - | positional_or_keyword |
Parameter Details
filename: The original filename string that needs to be sanitized. Can contain any characters including invalid filesystem characters like slashes, colons, asterisks, quotes, angle brackets, pipes, and backslashes. May also contain leading/trailing whitespace or be an empty string.
Return Value
Type: str
Returns a sanitized string safe for use as a filename. Invalid characters (\/*?:"<>|) are replaced with underscores. Leading and trailing whitespace is removed. If the resulting string is empty after sanitization, returns the default string 'document'. The return value is guaranteed to be a non-empty string suitable for filesystem operations.
Dependencies
re
Required Imports
import re
Usage Example
import re
def sanitize_filename(filename: str) -> str:
sanitized = re.sub(r'[\\/*?:"<>|]', "_", filename)
sanitized = sanitized.strip()
if not sanitized:
sanitized = "document"
return sanitized
# Example usage
original = "my/file:name*.txt"
safe_name = sanitize_filename(original)
print(safe_name) # Output: my_file_name_.txt
# Handle empty or whitespace-only strings
empty_result = sanitize_filename(" ")
print(empty_result) # Output: document
# Handle special characters
special = "report<2024>|final?.pdf"
safe_special = sanitize_filename(special)
print(safe_special) # Output: report_2024__final_.pdf
Best Practices
- Always sanitize user-provided filenames before saving to filesystem to prevent path traversal attacks
- Be aware that this function only handles character replacement and does not validate filename length limits (typically 255 characters on most systems)
- Consider additional validation for reserved filenames on Windows (CON, PRN, AUX, NUL, COM1-9, LPT1-9)
- The function preserves file extensions, but does not validate them - consider separate extension validation if needed
- For internationalization, be aware that this function preserves Unicode characters which may not be supported on all filesystems
- Consider combining with path validation to ensure the sanitized filename is used only as a basename, not a full path
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function sanitize_folders 64.5% similar
-
function get_file_extension 54.1% similar
-
function capitalize_unicode_name 52.2% similar
-
function is_valid_document_file 51.3% similar
-
function clean_html_tags 50.5% similar