function grouped_correlation_analysis
Performs Pearson correlation analysis between Eimeria-related variables and performance variables, grouped by specified categorical variables (e.g., treatment, challenge groups).
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
187 - 227
moderate
Purpose
This function conducts grouped correlation analysis to identify relationships between Eimeria infection metrics and performance outcomes within different experimental groups. It calculates Pearson correlation coefficients and p-values for each combination of Eimeria variable, performance variable, and group, filtering out groups with insufficient data (nā¤3). Results are compiled into a DataFrame for further analysis and reporting, with significance flagged at p<0.05.
Source Code
def grouped_correlation_analysis(df, eimeria_vars, performance_vars, grouping_vars):
"""Perform correlation analysis by treatment and challenge groups"""
print("\n" + "="*80)
print("GROUPED CORRELATION ANALYSIS")
print("="*80)
all_grouped_results = []
for group_var in grouping_vars:
print(f"\n\nAnalysis by {group_var}:")
print("-" * 60)
for group_value in df[group_var].unique():
group_data = df[df[group_var] == group_value]
print(f"\n{group_var} = {group_value} (n={len(group_data)})")
for eimeria_var in eimeria_vars:
for perf_var in performance_vars:
valid_data = group_data[[eimeria_var, perf_var]].dropna()
if len(valid_data) > 3:
pearson_r, pearson_p = pearsonr(valid_data[eimeria_var],
valid_data[perf_var])
all_grouped_results.append({
'Grouping_Variable': group_var,
'Group_Value': group_value,
'Eimeria_Variable': eimeria_var,
'Performance_Variable': perf_var,
'Correlation': pearson_r,
'P_value': pearson_p,
'N': len(valid_data),
'Significant': 'Yes' if pearson_p < 0.05 else 'No'
})
print(f" {eimeria_var} vs {perf_var}: r={pearson_r:.3f}, p={pearson_p:.4f}")
grouped_results_df = pd.DataFrame(all_grouped_results)
return grouped_results_df
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
df |
- | - | positional_or_keyword |
eimeria_vars |
- | - | positional_or_keyword |
performance_vars |
- | - | positional_or_keyword |
grouping_vars |
- | - | positional_or_keyword |
Parameter Details
df: pandas DataFrame containing the dataset with all variables to be analyzed. Must include columns specified in eimeria_vars, performance_vars, and grouping_vars.
eimeria_vars: List of column names (strings) representing Eimeria-related variables (e.g., oocyst counts, infection intensity measures) to correlate with performance variables.
performance_vars: List of column names (strings) representing performance outcome variables (e.g., weight gain, feed conversion ratio) to correlate with Eimeria variables.
grouping_vars: List of column names (strings) representing categorical variables used to split the data into groups (e.g., 'treatment', 'challenge_group'). Analysis is performed separately for each unique value within these variables.
Return Value
Returns a pandas DataFrame with columns: 'Grouping_Variable' (the grouping variable name), 'Group_Value' (specific group value), 'Eimeria_Variable' (Eimeria metric), 'Performance_Variable' (performance metric), 'Correlation' (Pearson r coefficient), 'P_value' (statistical significance), 'N' (sample size after removing NaN values), and 'Significant' ('Yes' if p<0.05, 'No' otherwise). Each row represents one correlation test result.
Dependencies
pandasscipy
Required Imports
import pandas as pd
from scipy.stats import pearsonr
Usage Example
import pandas as pd
from scipy.stats import pearsonr
# Sample data
df = pd.DataFrame({
'treatment': ['A', 'A', 'B', 'B', 'A', 'B'],
'challenge': ['high', 'high', 'low', 'low', 'high', 'low'],
'oocyst_count': [100, 150, 50, 60, 120, 55],
'weight_gain': [250, 240, 280, 275, 245, 285],
'feed_efficiency': [1.5, 1.6, 1.3, 1.4, 1.55, 1.35]
})
eimeria_vars = ['oocyst_count']
performance_vars = ['weight_gain', 'feed_efficiency']
grouping_vars = ['treatment', 'challenge']
results_df = grouped_correlation_analysis(df, eimeria_vars, performance_vars, grouping_vars)
print(results_df)
Best Practices
- Ensure the DataFrame contains sufficient non-null data points (>3) for each group to obtain meaningful correlation results
- Check that column names in eimeria_vars, performance_vars, and grouping_vars exactly match DataFrame column names
- Be aware that the function uses Pearson correlation, which assumes linear relationships and is sensitive to outliers
- Consider data distribution and normality before interpreting correlation coefficients
- The function prints extensive output to console; redirect or suppress output if running in batch mode
- Review the 'N' column in results to ensure adequate sample sizes for statistical validity
- Consider applying multiple testing corrections (e.g., Bonferroni) when interpreting p-values from multiple comparisons
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function calculate_correlations 83.0% similar
-
function generate_conclusions 77.1% similar
-
function main_v54 74.7% similar
-
function main_v24 73.1% similar
-
function create_scatter_plots 72.9% similar