function show_critical_errors
Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.
/tf/active/vicechatdev/data_quality_dashboard.py
193 - 223
moderate
Purpose
This function performs data quality analysis on treatment records by identifying and reporting three categories of critical date errors: (1) treatments incorrectly dated as 1900-01-01, (2) treatments administered more than 1 year after flock end date, and (3) treatments administered more than 1000 days before flock start date. It provides detailed breakdowns by flock and highlights the most extreme cases for immediate attention by data stewards or analysts.
Source Code
def show_critical_errors(before_start, after_end, severe_cases):
"""Show critical data errors that need immediate attention."""
print("\nCRITICAL DATA ERRORS (URGENT)")
print("-" * 40)
# 1900 date errors
errors_1900 = before_start[before_start['AdministeredDate'].dt.year == 1900]
print(f"1. Treatments with 1900-01-01 dates: {len(errors_1900)}")
if len(errors_1900) > 0:
print(" Affected flocks:")
for flock in errors_1900['FlockCD'].unique():
count = len(errors_1900[errors_1900['FlockCD'] == flock])
print(f" {flock}: {count} treatments")
# Extreme future dates
extreme_future = after_end[after_end['DaysAfterEnd'] > 365]
print(f"\n2. Treatments >1 year after flock end: {len(extreme_future)}")
if len(extreme_future) > 0:
print(" Most extreme cases:")
top_extreme = extreme_future.nlargest(5, 'DaysAfterEnd')
for _, row in top_extreme.iterrows():
print(f" {row['FlockCD']}: {row['DaysAfterEnd']:.0f} days after end")
# Extreme past dates
extreme_past = before_start[before_start['DaysBeforeStart'] > 1000]
print(f"\n3. Treatments >1000 days before flock start: {len(extreme_past)}")
if len(extreme_past) > 0:
print(" Most extreme cases:")
top_extreme = extreme_past.nlargest(5, 'DaysBeforeStart')
for _, row in top_extreme.iterrows():
print(f" {row['FlockCD']}: {row['DaysBeforeStart']:.0f} days before start")
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
before_start |
- | - | positional_or_keyword |
after_end |
- | - | positional_or_keyword |
severe_cases |
- | - | positional_or_keyword |
Parameter Details
before_start: A pandas DataFrame containing treatment records that occurred before the flock start date. Must include columns: 'AdministeredDate' (datetime), 'FlockCD' (flock identifier), and 'DaysBeforeStart' (numeric indicating days before flock start). Used to identify treatments with 1900 dates and extreme past dates.
after_end: A pandas DataFrame containing treatment records that occurred after the flock end date. Must include columns: 'FlockCD' (flock identifier) and 'DaysAfterEnd' (numeric indicating days after flock end). Used to identify treatments with extreme future dates.
severe_cases: A pandas DataFrame containing severe treatment cases. This parameter is accepted but not currently used in the function implementation, suggesting it may be reserved for future functionality or backward compatibility.
Return Value
This function returns None. It produces side effects by printing formatted error reports directly to stdout, including counts of critical errors, affected flock identifiers, and details of the most extreme cases.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
from datetime import datetime
# Create sample data
before_start_df = pd.DataFrame({
'FlockCD': ['F001', 'F001', 'F002'],
'AdministeredDate': pd.to_datetime(['1900-01-01', '2020-01-01', '2019-06-01']),
'DaysBeforeStart': [8000, 1500, 1200]
})
after_end_df = pd.DataFrame({
'FlockCD': ['F003', 'F004'],
'DaysAfterEnd': [400, 500]
})
severe_cases_df = pd.DataFrame()
# Call the function
show_critical_errors(before_start_df, after_end_df, severe_cases_df)
# Output will be printed to console showing:
# - Count of 1900 date errors by flock
# - Treatments more than 1 year after flock end
# - Treatments more than 1000 days before flock start
Best Practices
- Ensure 'AdministeredDate' column is converted to pandas datetime type before calling this function
- Pre-filter DataFrames to contain only relevant records (before_start should only have records before flock start, after_end should only have records after flock end)
- Calculate 'DaysBeforeStart' and 'DaysAfterEnd' columns before passing DataFrames to this function
- Consider redirecting output to a log file for production environments instead of relying on console output
- The severe_cases parameter is currently unused; verify if it's needed for your use case or can be omitted
- Handle empty DataFrames gracefully - the function checks for empty results but assumes input DataFrames have the required columns
- Review the threshold values (1900 for year, 365 days for future, 1000 days for past) to ensure they match your business rules
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function analyze_temporal_trends 75.7% similar
-
function quick_clean 74.8% similar
-
function generate_action_report 74.6% similar
-
function show_problematic_flocks 74.3% similar
-
function create_data_quality_dashboard 73.1% similar