grouped_correlation_analysis

function grouped_correlation_analysis

Maturity: 45

Performs Pearson correlation analysis between Eimeria-related variables and performance variables, grouped by specified categorical variables (e.g., treatment, challenge groups).

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py

Lines:
187 - 227

Complexity:
moderate

Purpose

This function conducts grouped correlation analysis to identify relationships between Eimeria infection metrics and performance outcomes within different experimental groups. It calculates Pearson correlation coefficients and p-values for each combination of Eimeria variable, performance variable, and group, filtering out groups with insufficient data (n≤3). Results are compiled into a DataFrame for further analysis and reporting, with significance flagged at p<0.05.

Source Code

def grouped_correlation_analysis(df, eimeria_vars, performance_vars, grouping_vars):
    """Perform correlation analysis by treatment and challenge groups"""
    
    print("\n" + "="*80)
    print("GROUPED CORRELATION ANALYSIS")
    print("="*80)
    
    all_grouped_results = []
    
    for group_var in grouping_vars:
        print(f"\n\nAnalysis by {group_var}:")
        print("-" * 60)
        
        for group_value in df[group_var].unique():
            group_data = df[df[group_var] == group_value]
            
            print(f"\n{group_var} = {group_value} (n={len(group_data)})")
            
            for eimeria_var in eimeria_vars:
                for perf_var in performance_vars:
                    valid_data = group_data[[eimeria_var, perf_var]].dropna()
                    
                    if len(valid_data) > 3:
                        pearson_r, pearson_p = pearsonr(valid_data[eimeria_var], 
                                                       valid_data[perf_var])
                        
                        all_grouped_results.append({
                            'Grouping_Variable': group_var,
                            'Group_Value': group_value,
                            'Eimeria_Variable': eimeria_var,
                            'Performance_Variable': perf_var,
                            'Correlation': pearson_r,
                            'P_value': pearson_p,
                            'N': len(valid_data),
                            'Significant': 'Yes' if pearson_p < 0.05 else 'No'
                        })
                        
                        print(f"  {eimeria_var} vs {perf_var}: r={pearson_r:.3f}, p={pearson_p:.4f}")
    
    grouped_results_df = pd.DataFrame(all_grouped_results)
    return grouped_results_df

Parameters

Name	Type	Default	Kind
`df`	-	-	positional_or_keyword
`eimeria_vars`	-	-	positional_or_keyword
`performance_vars`	-	-	positional_or_keyword
`grouping_vars`	-	-	positional_or_keyword

Parameter Details

df: pandas DataFrame containing the dataset with all variables to be analyzed. Must include columns specified in eimeria_vars, performance_vars, and grouping_vars.

eimeria_vars: List of column names (strings) representing Eimeria-related variables (e.g., oocyst counts, infection intensity measures) to correlate with performance variables.

performance_vars: List of column names (strings) representing performance outcome variables (e.g., weight gain, feed conversion ratio) to correlate with Eimeria variables.

grouping_vars: List of column names (strings) representing categorical variables used to split the data into groups (e.g., 'treatment', 'challenge_group'). Analysis is performed separately for each unique value within these variables.

Return Value

Returns a pandas DataFrame with columns: 'Grouping_Variable' (the grouping variable name), 'Group_Value' (specific group value), 'Eimeria_Variable' (Eimeria metric), 'Performance_Variable' (performance metric), 'Correlation' (Pearson r coefficient), 'P_value' (statistical significance), 'N' (sample size after removing NaN values), and 'Significant' ('Yes' if p<0.05, 'No' otherwise). Each row represents one correlation test result.

Dependencies

pandas
scipy

Required Imports

import pandas as pd
from scipy.stats import pearsonr

Usage Example

import pandas as pd
from scipy.stats import pearsonr

# Sample data
df = pd.DataFrame({
    'treatment': ['A', 'A', 'B', 'B', 'A', 'B'],
    'challenge': ['high', 'high', 'low', 'low', 'high', 'low'],
    'oocyst_count': [100, 150, 50, 60, 120, 55],
    'weight_gain': [250, 240, 280, 275, 245, 285],
    'feed_efficiency': [1.5, 1.6, 1.3, 1.4, 1.55, 1.35]
})

eimeria_vars = ['oocyst_count']
performance_vars = ['weight_gain', 'feed_efficiency']
grouping_vars = ['treatment', 'challenge']

results_df = grouped_correlation_analysis(df, eimeria_vars, performance_vars, grouping_vars)
print(results_df)

Best Practices

Ensure the DataFrame contains sufficient non-null data points (>3) for each group to obtain meaningful correlation results
Check that column names in eimeria_vars, performance_vars, and grouping_vars exactly match DataFrame column names
Be aware that the function uses Pearson correlation, which assumes linear relationships and is sensitive to outliers
Consider data distribution and normality before interpreting correlation coefficients
The function prints extensive output to console; redirect or suppress output if running in batch mode
Review the 'N' column in results to ensure adequate sample sizes for statistical validity
Consider applying multiple testing corrections (e.g., Bonferroni) when interpreting p-values from multiple comparisons

Similar Components

AI-powered semantic similarity - components with related functionality:

function calculate_correlations 83.0% similar

Calculates both Pearson and Spearman correlation coefficients between Eimeria variables and performance variables, filtering out missing values and identifying statistically significant relationships.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
function generate_conclusions 77.1% similar

Generates and prints comprehensive statistical conclusions from correlation analysis between Eimeria infection variables and broiler performance measures, including overall and group-specific findings.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
function main_v54 74.7% similar

Performs comprehensive exploratory data analysis on a broiler chicken performance dataset, analyzing the correlation between Eimeria infection and performance measures (weight gain, feed conversion ratio, mortality rate) across different treatments and challenge regimens.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/343f5578-64e0-4101-84bd-5824b3c15deb/project_1/analysis.py
function main_v24 73.1% similar

Orchestrates a complete correlation analysis pipeline for Eimeria infection and broiler performance data, from data loading through visualization and results export.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
function create_scatter_plots 72.9% similar

Creates scatter plots with linear regression lines showing relationships between Eimeria variables and performance variables, grouped by categorical variables, and saves them as PNG files.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def grouped_correlation_analysis(df, eimeria_vars, performance_vars, grouping_vars):
    """Perform correlation analysis by treatment and challenge groups"""
    
    print("\n" + "="*80)
    print("GROUPED CORRELATION ANALYSIS")
    print("="*80)
    
    all_grouped_results = []
    
    for group_var in grouping_vars:
        print(f"\n\nAnalysis by {group_var}:")
        print("-" * 60)
        
        for group_value in df[group_var].unique():
            group_data = df[df[group_var] == group_value]
            
            print(f"\n{group_var} = {group_value} (n={len(group_data)})")
            
            for eimeria_var in eimeria_vars:
                for perf_var in performance_vars:
                    valid_data = group_data[[eimeria_var, perf_var]].dropna()
                    
                    if len(valid_data) > 3:
                        pearson_r, pearson_p = pearsonr(valid_data[eimeria_var], 
                                                       valid_data[perf_var])
                        
                        all_grouped_results.append({
                            'Grouping_Variable': group_var,
                            'Group_Value': group_value,
                            'Eimeria_Variable': eimeria_var,
                            'Performance_Variable': perf_var,
                            'Correlation': pearson_r,
                            'P_value': pearson_p,
                            'N': len(valid_data),
                            'Significant': 'Yes' if pearson_p < 0.05 else 'No'
                        })
                        
                        print(f"  {eimeria_var} vs {perf_var}: r={pearson_r:.3f}, p={pearson_p:.4f}")
    
    grouped_results_df = pd.DataFrame(all_grouped_results)
    return grouped_results_df
                        

Improved Code

🔍 Code Extractor

function grouped_correlation_analysis

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function calculate_correlations 83.0% similar

function generate_conclusions 77.1% similar

function main_v54 74.7% similar

function main_v24 73.1% similar

function create_scatter_plots 72.9% similar

function grouped_correlation_analysis

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function calculate_correlations 83.0% similar

function generate_conclusions 77.1% similar

function main_v54 74.7% similar

function main_v24 73.1% similar

function create_scatter_plots 72.9% similar

✨ Improve Code: grouped_correlation_analysis

Code Comparison