🔍 Code Extractor

function show_critical_errors

Maturity: 45

Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
193 - 223
Complexity:
moderate

Purpose

This function performs data quality analysis on treatment records by identifying and reporting three categories of critical date errors: (1) treatments incorrectly dated as 1900-01-01, (2) treatments administered more than 1 year after flock end date, and (3) treatments administered more than 1000 days before flock start date. It provides detailed breakdowns by flock and highlights the most extreme cases for immediate attention by data stewards or analysts.

Source Code

def show_critical_errors(before_start, after_end, severe_cases):
    """Show critical data errors that need immediate attention."""
    print("\nCRITICAL DATA ERRORS (URGENT)")
    print("-" * 40)
    
    # 1900 date errors
    errors_1900 = before_start[before_start['AdministeredDate'].dt.year == 1900]
    print(f"1. Treatments with 1900-01-01 dates: {len(errors_1900)}")
    if len(errors_1900) > 0:
        print("   Affected flocks:")
        for flock in errors_1900['FlockCD'].unique():
            count = len(errors_1900[errors_1900['FlockCD'] == flock])
            print(f"     {flock}: {count} treatments")
    
    # Extreme future dates
    extreme_future = after_end[after_end['DaysAfterEnd'] > 365]
    print(f"\n2. Treatments >1 year after flock end: {len(extreme_future)}")
    if len(extreme_future) > 0:
        print("   Most extreme cases:")
        top_extreme = extreme_future.nlargest(5, 'DaysAfterEnd')
        for _, row in top_extreme.iterrows():
            print(f"     {row['FlockCD']}: {row['DaysAfterEnd']:.0f} days after end")
    
    # Extreme past dates
    extreme_past = before_start[before_start['DaysBeforeStart'] > 1000]
    print(f"\n3. Treatments >1000 days before flock start: {len(extreme_past)}")
    if len(extreme_past) > 0:
        print("   Most extreme cases:")
        top_extreme = extreme_past.nlargest(5, 'DaysBeforeStart')
        for _, row in top_extreme.iterrows():
            print(f"     {row['FlockCD']}: {row['DaysBeforeStart']:.0f} days before start")

Parameters

Name Type Default Kind
before_start - - positional_or_keyword
after_end - - positional_or_keyword
severe_cases - - positional_or_keyword

Parameter Details

before_start: A pandas DataFrame containing treatment records that occurred before the flock start date. Must include columns: 'AdministeredDate' (datetime), 'FlockCD' (flock identifier), and 'DaysBeforeStart' (numeric indicating days before flock start). Used to identify treatments with 1900 dates and extreme past dates.

after_end: A pandas DataFrame containing treatment records that occurred after the flock end date. Must include columns: 'FlockCD' (flock identifier) and 'DaysAfterEnd' (numeric indicating days after flock end). Used to identify treatments with extreme future dates.

severe_cases: A pandas DataFrame containing severe treatment cases. This parameter is accepted but not currently used in the function implementation, suggesting it may be reserved for future functionality or backward compatibility.

Return Value

This function returns None. It produces side effects by printing formatted error reports directly to stdout, including counts of critical errors, affected flock identifiers, and details of the most extreme cases.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd
from datetime import datetime

# Create sample data
before_start_df = pd.DataFrame({
    'FlockCD': ['F001', 'F001', 'F002'],
    'AdministeredDate': pd.to_datetime(['1900-01-01', '2020-01-01', '2019-06-01']),
    'DaysBeforeStart': [8000, 1500, 1200]
})

after_end_df = pd.DataFrame({
    'FlockCD': ['F003', 'F004'],
    'DaysAfterEnd': [400, 500]
})

severe_cases_df = pd.DataFrame()

# Call the function
show_critical_errors(before_start_df, after_end_df, severe_cases_df)

# Output will be printed to console showing:
# - Count of 1900 date errors by flock
# - Treatments more than 1 year after flock end
# - Treatments more than 1000 days before flock start

Best Practices

  • Ensure 'AdministeredDate' column is converted to pandas datetime type before calling this function
  • Pre-filter DataFrames to contain only relevant records (before_start should only have records before flock start, after_end should only have records after flock end)
  • Calculate 'DaysBeforeStart' and 'DaysAfterEnd' columns before passing DataFrames to this function
  • Consider redirecting output to a log file for production environments instead of relying on console output
  • The severe_cases parameter is currently unused; verify if it's needed for your use case or can be omitted
  • Handle empty DataFrames gracefully - the function checks for empty results but assumes input DataFrames have the required columns
  • Review the threshold values (1900 for year, 365 days for future, 1000 days for past) to ensure they match your business rules

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function analyze_temporal_trends 75.7% similar

    Analyzes and prints temporal trends in timing issues for treatments that occur before flock start dates or after flock end dates, breaking down occurrences by year and month.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function quick_clean 74.8% similar

    Cleans flock data by identifying and removing flocks that have treatment records with timing inconsistencies (treatments administered outside the flock's start/end date range).

    From: /tf/active/vicechatdev/quick_cleaner.py
  • function generate_action_report 74.6% similar

    Generates a comprehensive corrective action report for data quality issues in treatment records, categorizing actions by urgency and providing impact assessment.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function show_problematic_flocks 74.3% similar

    Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_data_quality_dashboard 73.1% similar

    Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse