create_sample_data_v2 - Code Extractor

function create_sample_data_v2

Maturity: 42

Generates a synthetic dataset of 200 poultry research records with multiple treatment groups, challenge regimens, and performance metrics for demonstration purposes.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py

Lines:
36 - 66

Complexity:
simple

Purpose

Creates a reproducible sample dataset simulating a poultry coccidiosis challenge study with realistic correlations between challenge levels and bird performance metrics. Useful for testing data analysis pipelines, visualization tools, or statistical methods in veterinary/agricultural research contexts without requiring real experimental data.

Source Code

def create_sample_data():
    """Create sample dataset for demonstration"""
    np.random.seed(42)
    n = 200
    
    treatments = ['Control', 'Treatment_A', 'Treatment_B', 'Treatment_C']
    challenge_regimens = ['Non-challenged', 'Low_challenge', 'High_challenge']
    
    data = {
        'bird_id': range(1, n+1),
        'treatment': np.random.choice(treatments, n),
        'challenge_regimen': np.random.choice(challenge_regimens, n),
        'eimeria_oocyst_count': np.random.exponential(5000, n),
        'eimeria_lesion_score': np.random.randint(0, 5, n),
        'body_weight_gain': np.random.normal(2000, 300, n),
        'feed_conversion_ratio': np.random.normal(1.8, 0.3, n),
        'feed_intake': np.random.normal(3500, 400, n),
        'mortality_rate': np.random.uniform(0, 15, n),
        'weight_day21': np.random.normal(800, 150, n),
        'weight_day35': np.random.normal(2000, 300, n),
        'intestinal_health_score': np.random.randint(1, 11, n)
    }
    
    df = pd.DataFrame(data)
    
    # Add realistic correlations
    df.loc[df['challenge_regimen'] == 'High_challenge', 'eimeria_oocyst_count'] *= 2
    df.loc[df['challenge_regimen'] == 'High_challenge', 'body_weight_gain'] *= 0.8
    df.loc[df['challenge_regimen'] == 'High_challenge', 'feed_conversion_ratio'] *= 1.2
    
    return df

Return Value

Returns a pandas DataFrame with 200 rows and 12 columns. Columns include: 'bird_id' (int, 1-200), 'treatment' (str, one of 4 treatment groups), 'challenge_regimen' (str, one of 3 challenge levels), 'eimeria_oocyst_count' (float, exponentially distributed around 5000, doubled for high challenge), 'eimeria_lesion_score' (int, 0-4), 'body_weight_gain' (float, normally distributed around 2000g, reduced by 20% for high challenge), 'feed_conversion_ratio' (float, normally distributed around 1.8, increased by 20% for high challenge), 'feed_intake' (float, normally distributed around 3500g), 'mortality_rate' (float, 0-15%), 'weight_day21' (float, normally distributed around 800g), 'weight_day35' (float, normally distributed around 2000g), and 'intestinal_health_score' (int, 1-10).

Dependencies

numpy
pandas

Required Imports

import numpy as np
import pandas as pd

Usage Example

import numpy as np
import pandas as pd

def create_sample_data():
    """Create sample dataset for demonstration"""
    np.random.seed(42)
    n = 200
    
    treatments = ['Control', 'Treatment_A', 'Treatment_B', 'Treatment_C']
    challenge_regimens = ['Non-challenged', 'Low_challenge', 'High_challenge']
    
    data = {
        'bird_id': range(1, n+1),
        'treatment': np.random.choice(treatments, n),
        'challenge_regimen': np.random.choice(challenge_regimens, n),
        'eimeria_oocyst_count': np.random.exponential(5000, n),
        'eimeria_lesion_score': np.random.randint(0, 5, n),
        'body_weight_gain': np.random.normal(2000, 300, n),
        'feed_conversion_ratio': np.random.normal(1.8, 0.3, n),
        'feed_intake': np.random.normal(3500, 400, n),
        'mortality_rate': np.random.uniform(0, 15, n),
        'weight_day21': np.random.normal(800, 150, n),
        'weight_day35': np.random.normal(2000, 300, n),
        'intestinal_health_score': np.random.randint(1, 11, n)
    }
    
    df = pd.DataFrame(data)
    
    df.loc[df['challenge_regimen'] == 'High_challenge', 'eimeria_oocyst_count'] *= 2
    df.loc[df['challenge_regimen'] == 'High_challenge', 'body_weight_gain'] *= 0.8
    df.loc[df['challenge_regimen'] == 'High_challenge', 'feed_conversion_ratio'] *= 1.2
    
    return df

# Generate sample data
df = create_sample_data()
print(df.head())
print(f"\nDataset shape: {df.shape}")
print(f"\nTreatment groups: {df['treatment'].unique()}")
print(f"Challenge regimens: {df['challenge_regimen'].unique()}")

Best Practices

The function uses np.random.seed(42) for reproducibility - the same data will be generated on every call
The function creates realistic correlations between challenge regimen and performance metrics, making it suitable for testing statistical analysis pipelines
Consider modifying the seed value if you need different random datasets for multiple test scenarios
The dataset size (n=200) is hardcoded; consider parameterizing if you need different sample sizes
High challenge birds show expected biological responses: increased oocyst counts, reduced weight gain, and poorer feed conversion
All numeric values use realistic ranges based on typical poultry research data

Similar Components

AI-powered semantic similarity - components with related functionality:

function create_sample_data_v1 70.7% similar

Generates a synthetic dataset with 200 samples containing group-based measurements, quality scores, environmental data, and temporal information, then saves it to a CSV file.
From: /tf/active/vicechatdev/full_smartstat/demo.py
function main_v56 65.2% similar

Performs comprehensive exploratory data analysis on a broiler chicken performance dataset, analyzing the correlation between Eimeria infection and performance measures (weight gain, feed conversion ratio, mortality rate) across different treatments and challenge regimens.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/343f5578-64e0-4101-84bd-5824b3c15deb/project_1/analysis.py
function create_data_quality_dashboard_v1 59.0% similar

Creates an interactive data quality dashboard for analyzing treatment timing issues in poultry flock management data by loading and processing CSV files containing timing anomalies.
From: /tf/active/vicechatdev/data_quality_dashboard.py
function create_data_quality_dashboard 58.3% similar

Creates an interactive command-line dashboard for analyzing data quality issues in treatment timing data, specifically focusing on treatments administered outside of flock lifecycle dates.
From: /tf/active/vicechatdev/data_quality_dashboard.py
function create_test_dataset 57.0% similar

Creates a test CSV dataset with sample product sales data across different regions and months, saving it to a temporary file.
From: /tf/active/vicechatdev/vice_ai/test_integration.py

🔍 Code Extractor

function create_sample_data_v2

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function create_sample_data_v1 70.7% similar

function main_v56 65.2% similar

function create_data_quality_dashboard_v1 59.0% similar

function create_data_quality_dashboard 58.3% similar

function create_test_dataset 57.0% similar

function create_sample_data_v2

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function create_sample_data_v1 70.7% similar

function main_v56 65.2% similar

function create_data_quality_dashboard_v1 59.0% similar

function create_data_quality_dashboard 58.3% similar

function create_test_dataset 57.0% similar

✨ Improve Code: create_sample_data_v2

Code Comparison