function show_problematic_flocks
Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.
/tf/active/vicechatdev/data_quality_dashboard.py
267 - 295
simple
Purpose
This function provides a diagnostic report for data quality analysis in livestock/poultry management systems. It identifies flocks with timing data entry errors by examining treatment records, highlighting both systematic issues (100% error rate) and high-volume flocks with significant but partial timing problems. This helps data managers prioritize data cleaning efforts and identify systematic data entry problems.
Source Code
def show_problematic_flocks(flocks_issues):
"""Show the most problematic flocks."""
print("\nMOST PROBLEMATIC FLOCKS")
print("-" * 40)
# Flocks with 100% timing issues
perfect_issues = flocks_issues[flocks_issues['TimingIssueRate'] == 1.0]
print(f"Flocks with 100% timing issues: {len(perfect_issues)}")
print("(These likely have systematic data entry errors)")
if len(perfect_issues) > 0:
print("\nTop 10 flocks with 100% issues (by treatment count):")
top_perfect = perfect_issues.nlargest(10, 'TotalTreatments')
for _, flock in top_perfect.iterrows():
print(f" {flock['FlockCD']}: {flock['TotalTreatments']} treatments, {flock['Type']} type")
# Flocks with partial issues but high volume
partial_issues = flocks_issues[
(flocks_issues['TimingIssueRate'] < 1.0) &
(flocks_issues['TimingIssueRate'] > 0.1) &
(flocks_issues['TotalTreatments'] >= 10)
]
if len(partial_issues) > 0:
print(f"\nHigh-volume flocks with significant timing issues (10+ treatments, >10% issues):")
top_partial = partial_issues.nlargest(10, 'TotalTreatments')
for _, flock in top_partial.iterrows():
rate = flock['TimingIssueRate'] * 100
print(f" {flock['FlockCD']}: {rate:.1f}% issues ({flock['TimingIssueCount']}/{flock['TotalTreatments']} treatments)")
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
flocks_issues |
- | - | positional_or_keyword |
Parameter Details
flocks_issues: A pandas DataFrame containing flock-level aggregated data with the following expected columns: 'TimingIssueRate' (float, 0.0-1.0 representing percentage of treatments with timing issues), 'FlockCD' (string/identifier for flock code), 'TotalTreatments' (integer, total number of treatments for the flock), 'Type' (string, type/category of flock), and 'TimingIssueCount' (integer, count of treatments with timing issues). The DataFrame should be pre-computed with these aggregated metrics.
Return Value
This function returns None. It produces console output displaying: (1) count and details of flocks with 100% timing issues, showing top 10 by treatment count, (2) high-volume flocks (10+ treatments) with >10% but <100% timing issues, showing top 10 by treatment count with their issue rates and counts.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
# Create sample flocks_issues DataFrame
flocks_issues = pd.DataFrame({
'FlockCD': ['F001', 'F002', 'F003', 'F004', 'F005'],
'TotalTreatments': [50, 25, 15, 100, 8],
'TimingIssueCount': [50, 25, 3, 20, 1],
'TimingIssueRate': [1.0, 1.0, 0.2, 0.2, 0.125],
'Type': ['Broiler', 'Layer', 'Broiler', 'Layer', 'Broiler']
})
# Display problematic flocks report
show_problematic_flocks(flocks_issues)
Best Practices
- Ensure the input DataFrame contains all required columns (TimingIssueRate, FlockCD, TotalTreatments, Type, TimingIssueCount) before calling this function
- Pre-calculate TimingIssueRate as a float between 0.0 and 1.0 (not as a percentage)
- This function is designed for console output; redirect stdout if you need to capture the output programmatically
- The function uses hardcoded thresholds (10+ treatments, >10% issues) which may need adjustment based on your dataset characteristics
- Consider the output as a diagnostic tool for identifying data quality issues rather than production reporting
- The function assumes FlockCD is a meaningful identifier; ensure it's populated and unique in your dataset
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function analyze_flock_type_patterns 76.1% similar
-
function quick_clean 75.0% similar
-
function show_critical_errors 74.3% similar
-
function analyze_temporal_trends 73.2% similar
-
function create_data_quality_dashboard 72.9% similar