šŸ” Code Extractor

function grouped_correlation_analysis

Maturity: 45

Performs Pearson correlation analysis between Eimeria-related variables and performance variables, grouped by specified categorical variables (e.g., treatment, challenge groups).

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
Lines:
187 - 227
Complexity:
moderate

Purpose

This function conducts grouped correlation analysis to identify relationships between Eimeria infection metrics and performance outcomes within different experimental groups. It calculates Pearson correlation coefficients and p-values for each combination of Eimeria variable, performance variable, and group, filtering out groups with insufficient data (n≤3). Results are compiled into a DataFrame for further analysis and reporting, with significance flagged at p<0.05.

Source Code

def grouped_correlation_analysis(df, eimeria_vars, performance_vars, grouping_vars):
    """Perform correlation analysis by treatment and challenge groups"""
    
    print("\n" + "="*80)
    print("GROUPED CORRELATION ANALYSIS")
    print("="*80)
    
    all_grouped_results = []
    
    for group_var in grouping_vars:
        print(f"\n\nAnalysis by {group_var}:")
        print("-" * 60)
        
        for group_value in df[group_var].unique():
            group_data = df[df[group_var] == group_value]
            
            print(f"\n{group_var} = {group_value} (n={len(group_data)})")
            
            for eimeria_var in eimeria_vars:
                for perf_var in performance_vars:
                    valid_data = group_data[[eimeria_var, perf_var]].dropna()
                    
                    if len(valid_data) > 3:
                        pearson_r, pearson_p = pearsonr(valid_data[eimeria_var], 
                                                       valid_data[perf_var])
                        
                        all_grouped_results.append({
                            'Grouping_Variable': group_var,
                            'Group_Value': group_value,
                            'Eimeria_Variable': eimeria_var,
                            'Performance_Variable': perf_var,
                            'Correlation': pearson_r,
                            'P_value': pearson_p,
                            'N': len(valid_data),
                            'Significant': 'Yes' if pearson_p < 0.05 else 'No'
                        })
                        
                        print(f"  {eimeria_var} vs {perf_var}: r={pearson_r:.3f}, p={pearson_p:.4f}")
    
    grouped_results_df = pd.DataFrame(all_grouped_results)
    return grouped_results_df

Parameters

Name Type Default Kind
df - - positional_or_keyword
eimeria_vars - - positional_or_keyword
performance_vars - - positional_or_keyword
grouping_vars - - positional_or_keyword

Parameter Details

df: pandas DataFrame containing the dataset with all variables to be analyzed. Must include columns specified in eimeria_vars, performance_vars, and grouping_vars.

eimeria_vars: List of column names (strings) representing Eimeria-related variables (e.g., oocyst counts, infection intensity measures) to correlate with performance variables.

performance_vars: List of column names (strings) representing performance outcome variables (e.g., weight gain, feed conversion ratio) to correlate with Eimeria variables.

grouping_vars: List of column names (strings) representing categorical variables used to split the data into groups (e.g., 'treatment', 'challenge_group'). Analysis is performed separately for each unique value within these variables.

Return Value

Returns a pandas DataFrame with columns: 'Grouping_Variable' (the grouping variable name), 'Group_Value' (specific group value), 'Eimeria_Variable' (Eimeria metric), 'Performance_Variable' (performance metric), 'Correlation' (Pearson r coefficient), 'P_value' (statistical significance), 'N' (sample size after removing NaN values), and 'Significant' ('Yes' if p<0.05, 'No' otherwise). Each row represents one correlation test result.

Dependencies

  • pandas
  • scipy

Required Imports

import pandas as pd
from scipy.stats import pearsonr

Usage Example

import pandas as pd
from scipy.stats import pearsonr

# Sample data
df = pd.DataFrame({
    'treatment': ['A', 'A', 'B', 'B', 'A', 'B'],
    'challenge': ['high', 'high', 'low', 'low', 'high', 'low'],
    'oocyst_count': [100, 150, 50, 60, 120, 55],
    'weight_gain': [250, 240, 280, 275, 245, 285],
    'feed_efficiency': [1.5, 1.6, 1.3, 1.4, 1.55, 1.35]
})

eimeria_vars = ['oocyst_count']
performance_vars = ['weight_gain', 'feed_efficiency']
grouping_vars = ['treatment', 'challenge']

results_df = grouped_correlation_analysis(df, eimeria_vars, performance_vars, grouping_vars)
print(results_df)

Best Practices

  • Ensure the DataFrame contains sufficient non-null data points (>3) for each group to obtain meaningful correlation results
  • Check that column names in eimeria_vars, performance_vars, and grouping_vars exactly match DataFrame column names
  • Be aware that the function uses Pearson correlation, which assumes linear relationships and is sensitive to outliers
  • Consider data distribution and normality before interpreting correlation coefficients
  • The function prints extensive output to console; redirect or suppress output if running in batch mode
  • Review the 'N' column in results to ensure adequate sample sizes for statistical validity
  • Consider applying multiple testing corrections (e.g., Bonferroni) when interpreting p-values from multiple comparisons

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_correlations 83.0% similar

    Calculates both Pearson and Spearman correlation coefficients between Eimeria variables and performance variables, filtering out missing values and identifying statistically significant relationships.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function generate_conclusions 77.1% similar

    Generates and prints comprehensive statistical conclusions from correlation analysis between Eimeria infection variables and broiler performance measures, including overall and group-specific findings.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function main_v54 74.7% similar

    Performs comprehensive exploratory data analysis on a broiler chicken performance dataset, analyzing the correlation between Eimeria infection and performance measures (weight gain, feed conversion ratio, mortality rate) across different treatments and challenge regimens.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/343f5578-64e0-4101-84bd-5824b3c15deb/project_1/analysis.py
  • function main_v24 73.1% similar

    Orchestrates a complete correlation analysis pipeline for Eimeria infection and broiler performance data, from data loading through visualization and results export.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function create_scatter_plots 72.9% similar

    Creates scatter plots with linear regression lines showing relationships between Eimeria variables and performance variables, grouped by categorical variables, and saves them as PNG files.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
← Back to Browse