🔍 Code Extractor

function create_scatter_plots

Maturity: 50

Creates scatter plots with linear regression lines showing relationships between Eimeria variables and performance variables, grouped by categorical variables, and saves them as PNG files.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
Lines:
254 - 302
Complexity:
moderate

Purpose

This function generates publication-quality scatter plots to visualize correlations between Eimeria infection metrics and performance indicators across different groups. It automatically fits linear regression lines, calculates Pearson correlation coefficients with p-values, and saves each plot as a high-resolution PNG file. The function is designed for exploratory data analysis in biological or veterinary research contexts where relationships between parasitic infection levels and performance metrics need to be visualized across experimental groups.

Source Code

def create_scatter_plots(df, eimeria_vars, performance_vars, grouping_vars):
    """Create scatter plots with regression lines"""
    
    for eimeria_var in eimeria_vars[:2]:  # Limit to first 2 to avoid too many plots
        for perf_var in performance_vars[:3]:  # Limit to first 3
            
            if grouping_vars:
                group_var = grouping_vars[0]
                
                fig, axes = plt.subplots(1, len(df[group_var].unique()), 
                                        figsize=(15, 5))
                
                if len(df[group_var].unique()) == 1:
                    axes = [axes]
                
                for idx, group_value in enumerate(df[group_var].unique()):
                    group_data = df[df[group_var] == group_value]
                    
                    axes[idx].scatter(group_data[eimeria_var], 
                                     group_data[perf_var], alpha=0.6)
                    
                    # Add regression line
                    valid_data = group_data[[eimeria_var, perf_var]].dropna()
                    if len(valid_data) > 3:
                        z = np.polyfit(valid_data[eimeria_var], 
                                      valid_data[perf_var], 1)
                        p = np.poly1d(z)
                        axes[idx].plot(valid_data[eimeria_var], 
                                      p(valid_data[eimeria_var]), 
                                      "r--", alpha=0.8, linewidth=2)
                        
                        r, p_val = pearsonr(valid_data[eimeria_var], 
                                          valid_data[perf_var])
                        axes[idx].text(0.05, 0.95, f'r={r:.3f}\np={p_val:.4f}',
                                      transform=axes[idx].transAxes,
                                      verticalalignment='top',
                                      bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
                    
                    axes[idx].set_title(f'{group_value}')
                    axes[idx].set_xlabel(eimeria_var)
                    axes[idx].set_ylabel(perf_var)
                
                plt.suptitle(f'{eimeria_var} vs {perf_var} by {group_var}', 
                           fontsize=14, fontweight='bold')
                plt.tight_layout()
                filename = f'scatter_{eimeria_var}_vs_{perf_var}.png'
                plt.savefig(filename, dpi=300, bbox_inches='tight')
                print(f"Saved: {filename}")
                plt.close()

Parameters

Name Type Default Kind
df - - positional_or_keyword
eimeria_vars - - positional_or_keyword
performance_vars - - positional_or_keyword
grouping_vars - - positional_or_keyword

Parameter Details

df: A pandas DataFrame containing the data to be plotted. Must include columns matching the names in eimeria_vars, performance_vars, and grouping_vars parameters. Should contain numeric data for the variable columns.

eimeria_vars: A list or array-like of column names (strings) representing Eimeria infection variables. Only the first 2 variables will be plotted to limit output volume. These should be numeric columns in the DataFrame.

performance_vars: A list or array-like of column names (strings) representing performance metrics. Only the first 3 variables will be plotted to limit output volume. These should be numeric columns in the DataFrame.

grouping_vars: A list or array-like of column names (strings) for categorical grouping variables. Only the first grouping variable (grouping_vars[0]) is used to create separate subplots for each unique group value. Can be an empty list if no grouping is desired, though the function expects at least one grouping variable for proper execution.

Return Value

This function does not return any value (returns None implicitly). Its primary output is side effects: it creates and saves PNG image files to the current working directory with filenames in the format 'scatter_{eimeria_var}_vs_{perf_var}.png', and prints confirmation messages to stdout for each saved file.

Dependencies

  • pandas
  • numpy
  • matplotlib
  • scipy

Required Imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

Usage Example

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Create sample data
df = pd.DataFrame({
    'eimeria_count': np.random.randint(0, 1000, 100),
    'eimeria_intensity': np.random.uniform(0, 10, 100),
    'weight_gain': np.random.normal(500, 50, 100),
    'feed_conversion': np.random.normal(1.8, 0.2, 100),
    'mortality_rate': np.random.uniform(0, 5, 100),
    'treatment_group': np.random.choice(['Control', 'Treatment_A', 'Treatment_B'], 100)
})

# Define variable lists
eimeria_vars = ['eimeria_count', 'eimeria_intensity']
performance_vars = ['weight_gain', 'feed_conversion', 'mortality_rate']
grouping_vars = ['treatment_group']

# Create scatter plots
create_scatter_plots(df, eimeria_vars, performance_vars, grouping_vars)

# Output: Creates 6 PNG files (2 eimeria vars × 3 performance vars)
# Files: scatter_eimeria_count_vs_weight_gain.png, etc.

Best Practices

  • Ensure the DataFrame contains sufficient non-null data points (at least 4) for each group to enable regression line fitting
  • Verify that eimeria_vars and performance_vars contain numeric data types to avoid plotting errors
  • Check that grouping_vars is not empty if grouping is required, as the function accesses grouping_vars[0] without validation
  • Be aware that the function limits plots to first 2 Eimeria variables and first 3 performance variables to prevent excessive output
  • Ensure write permissions exist in the working directory before calling the function
  • Consider the number of unique group values as this determines subplot count and figure width (15 inches total)
  • For headless environments, set matplotlib backend to 'Agg' before importing pyplot: matplotlib.use('Agg')
  • The function closes plots after saving, so plt.show() will not display them; remove plt.close() if interactive display is needed
  • Regression lines are only added when valid_data has more than 3 points; groups with insufficient data will show scatter points only
  • File names are generated from variable names, so ensure variable names are filesystem-safe (no special characters)

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_correlation_heatmap 74.7% similar

    Generates and saves a correlation heatmap visualizing the relationships between Eimeria infection indicators and performance measures from a pandas DataFrame.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function grouped_correlation_analysis 72.9% similar

    Performs Pearson correlation analysis between Eimeria-related variables and performance variables, grouped by specified categorical variables (e.g., treatment, challenge groups).

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function calculate_correlations 71.9% similar

    Calculates both Pearson and Spearman correlation coefficients between Eimeria variables and performance variables, filtering out missing values and identifying statistically significant relationships.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function main_v26 71.0% similar

    Orchestrates a complete correlation analysis pipeline for Eimeria infection and broiler performance data, from data loading through visualization and results export.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function main_v56 71.0% similar

    Performs comprehensive exploratory data analysis on a broiler chicken performance dataset, analyzing the correlation between Eimeria infection and performance measures (weight gain, feed conversion ratio, mortality rate) across different treatments and challenge regimens.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/343f5578-64e0-4101-84bd-5824b3c15deb/project_1/analysis.py
← Back to Browse