function main_v3
Command-line interface function that orchestrates pattern-based extraction of poultry flock data, including data loading, pattern classification, geocoding, and export functionality.
/tf/active/vicechatdev/pattern_based_extraction.py
505 - 622
complex
Purpose
This is the main entry point for a pattern-based poultry data extraction tool. It processes command-line arguments to extract flock data based on In-Ovo usage patterns (sequential, concurrent, mixed, or all), filters data by date, optionally performs geocoding and map generation, and exports results to CSV files. The function coordinates multiple extraction steps including data loading, mixed farm identification, pattern classification, data enrichment, and result export.
Source Code
def main():
"""Main function for pattern-based extraction."""
parser = argparse.ArgumentParser(description='Pattern-Based Poultry Data Extraction')
parser.add_argument('--pattern', type=str, required=True,
choices=['sequential', 'concurrent', 'mixed', 'all'],
help='In-Ovo usage pattern to extract')
parser.add_argument('--output', type=str, default=None,
help='Output CSV filename (default: auto-generated)')
parser.add_argument('--sample-size', type=int, default=None,
help='Number of flocks to sample (default: extract all)')
parser.add_argument('--geocoded-data', type=str, default=None,
help='Path to geocoded data file for coordinate enrichment')
parser.add_argument('--data-dir', type=str, default='/tf/active/pehestat_data',
help='Directory containing Pehestat data files')
parser.add_argument('--skip-geocoding', action='store_true',
help='Skip geocoding and map generation')
parser.add_argument('--cache-only', action='store_true',
help='Use geocoding cache only (no API calls)')
parser.add_argument('--create-map', action='store_true',
help='Create interactive map (requires geocoding)')
parser.add_argument('--map-output', type=str, default=None,
help='Output map filename (default: auto-generated)')
parser.add_argument('--use-clustering', action='store_true',
help='Enable marker clustering on the map')
parser.add_argument('--start-date', type=str, default='2020-01-01',
help='Start date filter (YYYY-MM-DD, default: 2020-01-01)')
args = parser.parse_args()
print("=" * 80)
print("PATTERN-BASED POULTRY DATA EXTRACTION")
print("=" * 80)
print(f"Target pattern: {args.pattern}")
print(f"Start date filter: {args.start_date}")
print(f"Sample size: {'All flocks' if args.sample_size is None else f'{args.sample_size:,} flocks'}")
print(f"Data directory: {args.data_dir}")
if args.geocoded_data:
print(f"Geocoded data: {args.geocoded_data}")
if not args.skip_geocoding:
if args.cache_only:
print("Geocoding: Cache-only mode (no API calls)")
else:
print("Geocoding: Full mode (includes API calls if needed)")
if args.create_map:
print("Map generation: Enabled")
else:
print("Geocoding: Disabled")
print("=" * 80)
try:
# Initialize extractor
extractor = PatternBasedExtractor(
data_dir=args.data_dir,
geocoded_file=args.geocoded_data
)
# Load and filter base data
flocks_df = extractor.load_and_filter_base_data(start_date=args.start_date)
# Identify mixed farms
mixed_farms_df = extractor.identify_mixed_farms(flocks_df)
if len(mixed_farms_df) == 0:
print("No mixed farms found! Cannot proceed with pattern extraction.")
return
# Classify farm patterns
patterns_df = extractor.classify_farm_patterns(flocks_df, mixed_farms_df)
if len(patterns_df) == 0:
print("No farm patterns could be classified! Cannot proceed.")
return
# Extract flocks by pattern
if args.pattern == 'all':
# Extract all patterns
for pattern in ['sequential', 'concurrent', 'mixed']:
pattern_flocks = extractor.extract_flocks_by_pattern(
pattern, flocks_df, patterns_df, args.sample_size
)
if len(pattern_flocks) > 0:
# Enrich data
enriched_flocks = extractor.enrich_flock_data(pattern_flocks)
# Export results
output_file = args.output
if output_file and args.pattern == 'all':
# Modify filename for each pattern
base, ext = os.path.splitext(output_file)
output_file = f"{base}_{pattern}{ext}"
extractor.export_results(enriched_flocks, pattern, output_file)
else:
# Extract specific pattern
pattern_flocks = extractor.extract_flocks_by_pattern(
args.pattern, flocks_df, patterns_df, args.sample_size
)
if len(pattern_flocks) == 0:
print(f"No flocks found for pattern '{args.pattern}'!")
return
# Enrich data
enriched_flocks = extractor.enrich_flock_data(pattern_flocks)
# Export results
extractor.export_results(enriched_flocks, args.pattern, args.output)
print("\nā
Pattern-based extraction completed successfully!")
except Exception as e:
print(f"\nā Error during pattern-based extraction: {e}")
import traceback
traceback.print_exc()
return 1
return 0
Return Value
Returns an integer exit code: 0 for successful completion, 1 for error conditions. Returns None implicitly if no mixed farms or patterns are found (early exit scenarios).
Dependencies
argparseossyspandasnumpydatetimetypingtracebackmatched_sample_analysisextractor
Required Imports
import os
import sys
import pandas as pd
import numpy as np
import argparse
from datetime import datetime
from typing import Dict, List, Optional, Tuple
from matched_sample_analysis import MatchedSampleAnalyzer
from extractor import PehestatDataExtractor
import traceback
Conditional/Optional Imports
These imports are only needed under specific conditions:
import traceback
Condition: only used in exception handling block when errors occur
Required (conditional)Usage Example
# Run from command line:
# Extract sequential pattern flocks from 2020 onwards
python script.py --pattern sequential --start-date 2020-01-01 --output sequential_flocks.csv
# Extract all patterns with sampling and geocoding
python script.py --pattern all --sample-size 1000 --geocoded-data geocoded.csv --create-map
# Extract concurrent pattern without geocoding
python script.py --pattern concurrent --skip-geocoding --output concurrent_only.csv
# Extract mixed pattern with cache-only geocoding and clustering map
python script.py --pattern mixed --cache-only --create-map --use-clustering --map-output mixed_map.html
# Programmatic usage (if called from Python):
if __name__ == '__main__':
sys.exit(main())
Best Practices
- Always specify the --pattern argument as it is required for execution
- Use --start-date to filter data to relevant time periods and improve performance
- When using --pattern all, be aware that output filenames will be automatically modified with pattern suffixes
- Use --sample-size for testing or when working with large datasets to limit processing time
- Enable --skip-geocoding if coordinates are not needed to speed up processing
- Use --cache-only to avoid API rate limits when geocoding data that may already be cached
- Check return code (0 for success, 1 for error) when calling programmatically
- Ensure the PatternBasedExtractor class is properly defined and imported before calling main()
- The function prints detailed progress information to stdout, so redirect or capture if needed
- Handle early exits gracefully - function returns None if no mixed farms or patterns are found
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class PatternBasedExtractor 66.4% similar
-
function main_v24 65.5% similar
-
function main_v54 59.6% similar
-
function analyze_flock_type_patterns 58.8% similar
-
function show_problematic_flocks 56.9% similar