šŸ” Code Extractor

function main_v3

Maturity: 50

Command-line interface function that orchestrates pattern-based extraction of poultry flock data, including data loading, pattern classification, geocoding, and export functionality.

File:
/tf/active/vicechatdev/pattern_based_extraction.py
Lines:
505 - 622
Complexity:
complex

Purpose

This is the main entry point for a pattern-based poultry data extraction tool. It processes command-line arguments to extract flock data based on In-Ovo usage patterns (sequential, concurrent, mixed, or all), filters data by date, optionally performs geocoding and map generation, and exports results to CSV files. The function coordinates multiple extraction steps including data loading, mixed farm identification, pattern classification, data enrichment, and result export.

Source Code

def main():
    """Main function for pattern-based extraction."""
    parser = argparse.ArgumentParser(description='Pattern-Based Poultry Data Extraction')
    parser.add_argument('--pattern', type=str, required=True, 
                       choices=['sequential', 'concurrent', 'mixed', 'all'],
                       help='In-Ovo usage pattern to extract')
    parser.add_argument('--output', type=str, default=None,
                       help='Output CSV filename (default: auto-generated)')
    parser.add_argument('--sample-size', type=int, default=None,
                       help='Number of flocks to sample (default: extract all)')
    parser.add_argument('--geocoded-data', type=str, default=None,
                       help='Path to geocoded data file for coordinate enrichment')
    parser.add_argument('--data-dir', type=str, default='/tf/active/pehestat_data',
                       help='Directory containing Pehestat data files')
    parser.add_argument('--skip-geocoding', action='store_true',
                       help='Skip geocoding and map generation')
    parser.add_argument('--cache-only', action='store_true',
                       help='Use geocoding cache only (no API calls)')
    parser.add_argument('--create-map', action='store_true',
                       help='Create interactive map (requires geocoding)')
    parser.add_argument('--map-output', type=str, default=None,
                       help='Output map filename (default: auto-generated)')
    parser.add_argument('--use-clustering', action='store_true',
                       help='Enable marker clustering on the map')
    parser.add_argument('--start-date', type=str, default='2020-01-01',
                       help='Start date filter (YYYY-MM-DD, default: 2020-01-01)')
    
    args = parser.parse_args()
    
    print("=" * 80)
    print("PATTERN-BASED POULTRY DATA EXTRACTION")
    print("=" * 80)
    print(f"Target pattern: {args.pattern}")
    print(f"Start date filter: {args.start_date}")
    print(f"Sample size: {'All flocks' if args.sample_size is None else f'{args.sample_size:,} flocks'}")
    print(f"Data directory: {args.data_dir}")
    if args.geocoded_data:
        print(f"Geocoded data: {args.geocoded_data}")
    if not args.skip_geocoding:
        if args.cache_only:
            print("Geocoding: Cache-only mode (no API calls)")
        else:
            print("Geocoding: Full mode (includes API calls if needed)")
        if args.create_map:
            print("Map generation: Enabled")
    else:
        print("Geocoding: Disabled")
    print("=" * 80)
    
    try:
        # Initialize extractor
        extractor = PatternBasedExtractor(
            data_dir=args.data_dir,
            geocoded_file=args.geocoded_data
        )
        
        # Load and filter base data
        flocks_df = extractor.load_and_filter_base_data(start_date=args.start_date)
        
        # Identify mixed farms
        mixed_farms_df = extractor.identify_mixed_farms(flocks_df)
        
        if len(mixed_farms_df) == 0:
            print("No mixed farms found! Cannot proceed with pattern extraction.")
            return
        
        # Classify farm patterns
        patterns_df = extractor.classify_farm_patterns(flocks_df, mixed_farms_df)
        
        if len(patterns_df) == 0:
            print("No farm patterns could be classified! Cannot proceed.")
            return
        
        # Extract flocks by pattern
        if args.pattern == 'all':
            # Extract all patterns
            for pattern in ['sequential', 'concurrent', 'mixed']:
                pattern_flocks = extractor.extract_flocks_by_pattern(
                    pattern, flocks_df, patterns_df, args.sample_size
                )
                
                if len(pattern_flocks) > 0:
                    # Enrich data
                    enriched_flocks = extractor.enrich_flock_data(pattern_flocks)
                    
                    # Export results
                    output_file = args.output
                    if output_file and args.pattern == 'all':
                        # Modify filename for each pattern
                        base, ext = os.path.splitext(output_file)
                        output_file = f"{base}_{pattern}{ext}"
                    
                    extractor.export_results(enriched_flocks, pattern, output_file)
        else:
            # Extract specific pattern
            pattern_flocks = extractor.extract_flocks_by_pattern(
                args.pattern, flocks_df, patterns_df, args.sample_size
            )
            
            if len(pattern_flocks) == 0:
                print(f"No flocks found for pattern '{args.pattern}'!")
                return
            
            # Enrich data
            enriched_flocks = extractor.enrich_flock_data(pattern_flocks)
            
            # Export results
            extractor.export_results(enriched_flocks, args.pattern, args.output)
        
        print("\nāœ… Pattern-based extraction completed successfully!")
        
    except Exception as e:
        print(f"\nāŒ Error during pattern-based extraction: {e}")
        import traceback
        traceback.print_exc()
        return 1
    
    return 0

Return Value

Returns an integer exit code: 0 for successful completion, 1 for error conditions. Returns None implicitly if no mixed farms or patterns are found (early exit scenarios).

Dependencies

  • argparse
  • os
  • sys
  • pandas
  • numpy
  • datetime
  • typing
  • traceback
  • matched_sample_analysis
  • extractor

Required Imports

import os
import sys
import pandas as pd
import numpy as np
import argparse
from datetime import datetime
from typing import Dict, List, Optional, Tuple
from matched_sample_analysis import MatchedSampleAnalyzer
from extractor import PehestatDataExtractor
import traceback

Conditional/Optional Imports

These imports are only needed under specific conditions:

import traceback

Condition: only used in exception handling block when errors occur

Required (conditional)

Usage Example

# Run from command line:
# Extract sequential pattern flocks from 2020 onwards
python script.py --pattern sequential --start-date 2020-01-01 --output sequential_flocks.csv

# Extract all patterns with sampling and geocoding
python script.py --pattern all --sample-size 1000 --geocoded-data geocoded.csv --create-map

# Extract concurrent pattern without geocoding
python script.py --pattern concurrent --skip-geocoding --output concurrent_only.csv

# Extract mixed pattern with cache-only geocoding and clustering map
python script.py --pattern mixed --cache-only --create-map --use-clustering --map-output mixed_map.html

# Programmatic usage (if called from Python):
if __name__ == '__main__':
    sys.exit(main())

Best Practices

  • Always specify the --pattern argument as it is required for execution
  • Use --start-date to filter data to relevant time periods and improve performance
  • When using --pattern all, be aware that output filenames will be automatically modified with pattern suffixes
  • Use --sample-size for testing or when working with large datasets to limit processing time
  • Enable --skip-geocoding if coordinates are not needed to speed up processing
  • Use --cache-only to avoid API rate limits when geocoding data that may already be cached
  • Check return code (0 for success, 1 for error) when calling programmatically
  • Ensure the PatternBasedExtractor class is properly defined and imported before calling main()
  • The function prints detailed progress information to stdout, so redirect or capture if needed
  • Handle early exits gracefully - function returns None if no mixed farms or patterns are found

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class PatternBasedExtractor 66.4% similar

    Extract flocks based on farm-level In-Ovo usage patterns.

    From: /tf/active/vicechatdev/pattern_based_extraction.py
  • function main_v24 65.5% similar

    Orchestrates a complete correlation analysis pipeline for Eimeria infection and broiler performance data, from data loading through visualization and results export.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function main_v54 59.6% similar

    Performs comprehensive exploratory data analysis on a broiler chicken performance dataset, analyzing the correlation between Eimeria infection and performance measures (weight gain, feed conversion ratio, mortality rate) across different treatments and challenge regimens.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/343f5578-64e0-4101-84bd-5824b3c15deb/project_1/analysis.py
  • function analyze_flock_type_patterns 58.8% similar

    Analyzes and prints timing pattern statistics for flock data by categorizing issues that occur before start time and after end time, grouped by flock type.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function show_problematic_flocks 56.9% similar

    Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse