🔍 Code Extractor

function show_problematic_flocks

Maturity: 42

Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
267 - 295
Complexity:
simple

Purpose

This function provides a diagnostic report for data quality analysis in livestock/poultry management systems. It identifies flocks with timing data entry errors by examining treatment records, highlighting both systematic issues (100% error rate) and high-volume flocks with significant but partial timing problems. This helps data managers prioritize data cleaning efforts and identify systematic data entry problems.

Source Code

def show_problematic_flocks(flocks_issues):
    """Show the most problematic flocks."""
    print("\nMOST PROBLEMATIC FLOCKS")
    print("-" * 40)
    
    # Flocks with 100% timing issues
    perfect_issues = flocks_issues[flocks_issues['TimingIssueRate'] == 1.0]
    print(f"Flocks with 100% timing issues: {len(perfect_issues)}")
    print("(These likely have systematic data entry errors)")
    
    if len(perfect_issues) > 0:
        print("\nTop 10 flocks with 100% issues (by treatment count):")
        top_perfect = perfect_issues.nlargest(10, 'TotalTreatments')
        for _, flock in top_perfect.iterrows():
            print(f"  {flock['FlockCD']}: {flock['TotalTreatments']} treatments, {flock['Type']} type")
    
    # Flocks with partial issues but high volume
    partial_issues = flocks_issues[
        (flocks_issues['TimingIssueRate'] < 1.0) & 
        (flocks_issues['TimingIssueRate'] > 0.1) &
        (flocks_issues['TotalTreatments'] >= 10)
    ]
    
    if len(partial_issues) > 0:
        print(f"\nHigh-volume flocks with significant timing issues (10+ treatments, >10% issues):")
        top_partial = partial_issues.nlargest(10, 'TotalTreatments')
        for _, flock in top_partial.iterrows():
            rate = flock['TimingIssueRate'] * 100
            print(f"  {flock['FlockCD']}: {rate:.1f}% issues ({flock['TimingIssueCount']}/{flock['TotalTreatments']} treatments)")

Parameters

Name Type Default Kind
flocks_issues - - positional_or_keyword

Parameter Details

flocks_issues: A pandas DataFrame containing flock-level aggregated data with the following expected columns: 'TimingIssueRate' (float, 0.0-1.0 representing percentage of treatments with timing issues), 'FlockCD' (string/identifier for flock code), 'TotalTreatments' (integer, total number of treatments for the flock), 'Type' (string, type/category of flock), and 'TimingIssueCount' (integer, count of treatments with timing issues). The DataFrame should be pre-computed with these aggregated metrics.

Return Value

This function returns None. It produces console output displaying: (1) count and details of flocks with 100% timing issues, showing top 10 by treatment count, (2) high-volume flocks (10+ treatments) with >10% but <100% timing issues, showing top 10 by treatment count with their issue rates and counts.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

# Create sample flocks_issues DataFrame
flocks_issues = pd.DataFrame({
    'FlockCD': ['F001', 'F002', 'F003', 'F004', 'F005'],
    'TotalTreatments': [50, 25, 15, 100, 8],
    'TimingIssueCount': [50, 25, 3, 20, 1],
    'TimingIssueRate': [1.0, 1.0, 0.2, 0.2, 0.125],
    'Type': ['Broiler', 'Layer', 'Broiler', 'Layer', 'Broiler']
})

# Display problematic flocks report
show_problematic_flocks(flocks_issues)

Best Practices

  • Ensure the input DataFrame contains all required columns (TimingIssueRate, FlockCD, TotalTreatments, Type, TimingIssueCount) before calling this function
  • Pre-calculate TimingIssueRate as a float between 0.0 and 1.0 (not as a percentage)
  • This function is designed for console output; redirect stdout if you need to capture the output programmatically
  • The function uses hardcoded thresholds (10+ treatments, >10% issues) which may need adjustment based on your dataset characteristics
  • Consider the output as a diagnostic tool for identifying data quality issues rather than production reporting
  • The function assumes FlockCD is a meaningful identifier; ensure it's populated and unique in your dataset

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function analyze_flock_type_patterns 76.1% similar

    Analyzes and prints timing pattern statistics for flock data by categorizing issues that occur before start time and after end time, grouped by flock type.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function quick_clean 75.0% similar

    Cleans flock data by identifying and removing flocks that have treatment records with timing inconsistencies (treatments administered outside the flock's start/end date range).

    From: /tf/active/vicechatdev/quick_cleaner.py
  • function show_critical_errors 74.3% similar

    Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function analyze_temporal_trends 73.2% similar

    Analyzes and prints temporal trends in timing issues for treatments that occur before flock start dates or after flock end dates, breaking down occurrences by year and month.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function create_data_quality_dashboard 72.9% similar

    Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse