function create_scatter_plots
Creates scatter plots with linear regression lines showing relationships between Eimeria variables and performance variables, grouped by categorical variables, and saves them as PNG files.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
254 - 302
moderate
Purpose
This function generates publication-quality scatter plots to visualize correlations between Eimeria infection metrics and performance indicators across different groups. It automatically fits linear regression lines, calculates Pearson correlation coefficients with p-values, and saves each plot as a high-resolution PNG file. The function is designed for exploratory data analysis in biological or veterinary research contexts where relationships between parasitic infection levels and performance metrics need to be visualized across experimental groups.
Source Code
def create_scatter_plots(df, eimeria_vars, performance_vars, grouping_vars):
"""Create scatter plots with regression lines"""
for eimeria_var in eimeria_vars[:2]: # Limit to first 2 to avoid too many plots
for perf_var in performance_vars[:3]: # Limit to first 3
if grouping_vars:
group_var = grouping_vars[0]
fig, axes = plt.subplots(1, len(df[group_var].unique()),
figsize=(15, 5))
if len(df[group_var].unique()) == 1:
axes = [axes]
for idx, group_value in enumerate(df[group_var].unique()):
group_data = df[df[group_var] == group_value]
axes[idx].scatter(group_data[eimeria_var],
group_data[perf_var], alpha=0.6)
# Add regression line
valid_data = group_data[[eimeria_var, perf_var]].dropna()
if len(valid_data) > 3:
z = np.polyfit(valid_data[eimeria_var],
valid_data[perf_var], 1)
p = np.poly1d(z)
axes[idx].plot(valid_data[eimeria_var],
p(valid_data[eimeria_var]),
"r--", alpha=0.8, linewidth=2)
r, p_val = pearsonr(valid_data[eimeria_var],
valid_data[perf_var])
axes[idx].text(0.05, 0.95, f'r={r:.3f}\np={p_val:.4f}',
transform=axes[idx].transAxes,
verticalalignment='top',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
axes[idx].set_title(f'{group_value}')
axes[idx].set_xlabel(eimeria_var)
axes[idx].set_ylabel(perf_var)
plt.suptitle(f'{eimeria_var} vs {perf_var} by {group_var}',
fontsize=14, fontweight='bold')
plt.tight_layout()
filename = f'scatter_{eimeria_var}_vs_{perf_var}.png'
plt.savefig(filename, dpi=300, bbox_inches='tight')
print(f"Saved: {filename}")
plt.close()
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
df |
- | - | positional_or_keyword |
eimeria_vars |
- | - | positional_or_keyword |
performance_vars |
- | - | positional_or_keyword |
grouping_vars |
- | - | positional_or_keyword |
Parameter Details
df: A pandas DataFrame containing the data to be plotted. Must include columns matching the names in eimeria_vars, performance_vars, and grouping_vars parameters. Should contain numeric data for the variable columns.
eimeria_vars: A list or array-like of column names (strings) representing Eimeria infection variables. Only the first 2 variables will be plotted to limit output volume. These should be numeric columns in the DataFrame.
performance_vars: A list or array-like of column names (strings) representing performance metrics. Only the first 3 variables will be plotted to limit output volume. These should be numeric columns in the DataFrame.
grouping_vars: A list or array-like of column names (strings) for categorical grouping variables. Only the first grouping variable (grouping_vars[0]) is used to create separate subplots for each unique group value. Can be an empty list if no grouping is desired, though the function expects at least one grouping variable for proper execution.
Return Value
This function does not return any value (returns None implicitly). Its primary output is side effects: it creates and saves PNG image files to the current working directory with filenames in the format 'scatter_{eimeria_var}_vs_{perf_var}.png', and prints confirmation messages to stdout for each saved file.
Dependencies
pandasnumpymatplotlibscipy
Required Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
Usage Example
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
# Create sample data
df = pd.DataFrame({
'eimeria_count': np.random.randint(0, 1000, 100),
'eimeria_intensity': np.random.uniform(0, 10, 100),
'weight_gain': np.random.normal(500, 50, 100),
'feed_conversion': np.random.normal(1.8, 0.2, 100),
'mortality_rate': np.random.uniform(0, 5, 100),
'treatment_group': np.random.choice(['Control', 'Treatment_A', 'Treatment_B'], 100)
})
# Define variable lists
eimeria_vars = ['eimeria_count', 'eimeria_intensity']
performance_vars = ['weight_gain', 'feed_conversion', 'mortality_rate']
grouping_vars = ['treatment_group']
# Create scatter plots
create_scatter_plots(df, eimeria_vars, performance_vars, grouping_vars)
# Output: Creates 6 PNG files (2 eimeria vars × 3 performance vars)
# Files: scatter_eimeria_count_vs_weight_gain.png, etc.
Best Practices
- Ensure the DataFrame contains sufficient non-null data points (at least 4) for each group to enable regression line fitting
- Verify that eimeria_vars and performance_vars contain numeric data types to avoid plotting errors
- Check that grouping_vars is not empty if grouping is required, as the function accesses grouping_vars[0] without validation
- Be aware that the function limits plots to first 2 Eimeria variables and first 3 performance variables to prevent excessive output
- Ensure write permissions exist in the working directory before calling the function
- Consider the number of unique group values as this determines subplot count and figure width (15 inches total)
- For headless environments, set matplotlib backend to 'Agg' before importing pyplot: matplotlib.use('Agg')
- The function closes plots after saving, so plt.show() will not display them; remove plt.close() if interactive display is needed
- Regression lines are only added when valid_data has more than 3 points; groups with insufficient data will show scatter points only
- File names are generated from variable names, so ensure variable names are filesystem-safe (no special characters)
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_correlation_heatmap 74.7% similar
-
function grouped_correlation_analysis 72.9% similar
-
function calculate_correlations 71.9% similar
-
function main_v26 71.0% similar
-
function main_v56 71.0% similar