function extract_table_as_markdown
Extracts a specified row range from a pandas DataFrame and converts it into a properly formatted markdown table with automatic header detection and data cleaning.
/tf/active/vicechatdev/vice_ai/smartstat_service.py
645 - 721
moderate
Purpose
This function is designed to convert tabular data from a DataFrame into markdown format, making it suitable for documentation, reports, or display in markdown-compatible environments. It intelligently detects header rows using heuristics, removes empty columns, cleans data by escaping special characters, and ensures proper markdown table formatting with aligned columns and separators.
Source Code
def extract_table_as_markdown(df: pd.DataFrame, start_row: int, end_row: int) -> str:
"""
Extract a table region and format as proper markdown table
"""
# Extract the table region
table_df = df.iloc[start_row:end_row + 1].copy()
# Remove completely empty columns
table_df = table_df.dropna(axis=1, how='all')
if len(table_df) == 0 or len(table_df.columns) == 0:
return ""
# Clean up the data
table_df = table_df.fillna('')
# Convert all columns to string and clean
for col in table_df.columns:
table_df[col] = table_df[col].astype(str).str.strip()
# Detect header row(s) - look for rows that look like headers
header_candidates = []
for idx in range(min(3, len(table_df))): # Check first 3 rows max
row_values = table_df.iloc[idx].values
non_empty = [v for v in row_values if v != '']
# Heuristics for header detection
if len(non_empty) >= 2: # Has multiple columns with content
# Check if values look like headers (short, descriptive text)
header_like = sum(1 for v in non_empty
if len(v) < 50 and not v.replace('.', '').replace(',', '').isdigit())
if header_like >= len(non_empty) * 0.7: # 70% of values look like headers
header_candidates.append(idx)
# Use first detected header, or row 0 as fallback
header_row = header_candidates[0] if header_candidates else 0
# Split into header and data
if header_row < len(table_df):
headers = table_df.iloc[header_row].values.tolist()
data_rows = table_df.iloc[header_row + 1:] if header_row + 1 < len(table_df) else pd.DataFrame()
else:
# No clear header, use column indices
headers = [f"Col{i+1}" for i in range(len(table_df.columns))]
data_rows = table_df
# Clean headers
headers = [str(h).strip() if str(h).strip() else f"Col{i+1}" for i, h in enumerate(headers)]
# Build markdown table
markdown_lines = []
# Header row
markdown_lines.append("| " + " | ".join(headers) + " |")
# Separator row
markdown_lines.append("| " + " | ".join(["---"] * len(headers)) + " |")
# Data rows
for _, row in data_rows.iterrows():
values = row.values.tolist()
# Pad or truncate to match header count
while len(values) < len(headers):
values.append("")
values = values[:len(headers)]
# Clean values for markdown
cleaned_values = []
for v in values:
v_str = str(v).strip()
# Escape markdown special characters
v_str = v_str.replace("|", "\\|").replace("\n", " ").replace("\r", "")
cleaned_values.append(v_str)
markdown_lines.append("| " + " | ".join(cleaned_values) + " |")
return "\n".join(markdown_lines)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
df |
pd.DataFrame | - | positional_or_keyword |
start_row |
int | - | positional_or_keyword |
end_row |
int | - | positional_or_keyword |
Parameter Details
df: A pandas DataFrame containing the source data from which the table will be extracted. Can contain any data types as they will be converted to strings.
start_row: Integer index (0-based) indicating the first row to include in the extracted table. Must be a valid row index within the DataFrame.
end_row: Integer index (0-based) indicating the last row to include in the extracted table (inclusive). Must be greater than or equal to start_row and be a valid row index within the DataFrame.
Return Value
Type: str
Returns a string containing the markdown-formatted table with pipe-delimited columns, header row, separator row with dashes, and data rows. Returns an empty string if the extracted region has no data or columns after cleaning. Special characters like pipes, newlines, and carriage returns are escaped or removed to maintain markdown integrity.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
# Create sample DataFrame
data = {
'Name': ['Product', 'Item A', 'Item B', 'Item C'],
'Price': ['Cost', '10.50', '25.00', '15.75'],
'Quantity': ['Qty', '100', '50', '75']
}
df = pd.DataFrame(data)
# Extract rows 0-3 as markdown table
markdown_table = extract_table_as_markdown(df, start_row=0, end_row=3)
print(markdown_table)
# Output:
# | Product | Cost | Qty |
# | --- | --- | --- |
# | Item A | 10.50 | 100 |
# | Item B | 25.00 | 50 |
# | Item C | 15.75 | 75 |
Best Practices
- Ensure start_row and end_row are valid indices within the DataFrame bounds to avoid IndexError
- The function automatically detects headers by analyzing the first 3 rows; if your data has headers beyond row 3, they may not be detected
- Empty columns are automatically removed, which may result in fewer columns than expected
- The function escapes pipe characters (|) in cell values to prevent breaking the markdown table structure
- Numeric data is converted to strings, so formatting control should be done before passing the DataFrame
- For large tables, consider the readability of the resulting markdown as very wide tables may not display well
- The header detection heuristic assumes headers are short text (< 50 characters) and not purely numeric; adjust data accordingly if this doesn't match your use case
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function extract_sheet_context 63.3% similar
-
function detect_table_boundaries 62.8% similar
-
function simple_markdown_to_html 58.3% similar
-
function main_v16 57.9% similar
-
function markdown_to_html 57.6% similar