🔍 Code Extractor

function correlation_significance

Maturity: 44

Calculates Pearson correlation coefficient and statistical significance (p-value) between two numeric arrays, handling NaN values automatically.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_7.py
Lines:
348 - 359
Complexity:
simple

Purpose

This function computes the Pearson correlation coefficient to measure linear relationship strength between two variables, along with the p-value to assess statistical significance. It automatically filters out NaN values from both arrays and requires at least 3 valid data points to perform the calculation. Returns None values if insufficient data is available.

Source Code

def correlation_significance(x, y):
    """Calculate correlation and p-value"""
    # Remove NaN values
    mask = ~(np.isnan(x) | np.isnan(y))
    x_clean = x[mask]
    y_clean = y[mask]
    
    if len(x_clean) < 3:
        return None, None, 0
    
    corr, p_value = stats.pearsonr(x_clean, y_clean)
    return corr, p_value, len(x_clean)

Parameters

Name Type Default Kind
x - - positional_or_keyword
y - - positional_or_keyword

Parameter Details

x: First numeric array or array-like object (list, numpy array, pandas Series). Can contain NaN values which will be automatically removed. Should be numeric data representing one variable in the correlation analysis.

y: Second numeric array or array-like object (list, numpy array, pandas Series). Must have the same length as x. Can contain NaN values which will be automatically removed. Should be numeric data representing the second variable in the correlation analysis.

Return Value

Returns a tuple of three values: (corr, p_value, n_samples). 'corr' is the Pearson correlation coefficient (float between -1 and 1, or None if insufficient data). 'p_value' is the two-tailed p-value for testing non-correlation (float between 0 and 1, or None if insufficient data). 'n_samples' is the number of valid (non-NaN) paired observations used in the calculation (integer, minimum 0).

Dependencies

  • numpy
  • scipy

Required Imports

import numpy as np
from scipy import stats

Usage Example

import numpy as np
from scipy import stats

def correlation_significance(x, y):
    mask = ~(np.isnan(x) | np.isnan(y))
    x_clean = x[mask]
    y_clean = y[mask]
    if len(x_clean) < 3:
        return None, None, 0
    corr, p_value = stats.pearsonr(x_clean, y_clean)
    return corr, p_value, len(x_clean)

# Example usage
x = np.array([1, 2, 3, 4, 5, np.nan, 7])
y = np.array([2, 4, 5, 4, 5, 6, np.nan])
corr, p_val, n = correlation_significance(x, y)
print(f"Correlation: {corr:.3f}, P-value: {p_val:.3f}, N: {n}")
# Output: Correlation: 0.500, P-value: 0.391, N: 5

Best Practices

  • Ensure both input arrays have the same length before calling the function
  • The function requires at least 3 valid (non-NaN) paired observations to calculate correlation; otherwise it returns (None, None, 0)
  • NaN values are automatically removed pairwise - if either x[i] or y[i] is NaN, both values at index i are excluded
  • The p-value tests the null hypothesis that there is no linear correlation between the variables
  • Pearson correlation assumes linear relationships and is sensitive to outliers
  • Consider checking the returned n_samples value to ensure sufficient data was available for meaningful analysis
  • For non-linear relationships, consider using Spearman or Kendall correlation instead

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_correlations 64.4% similar

    Calculates both Pearson and Spearman correlation coefficients between Eimeria variables and performance variables, filtering out missing values and identifying statistically significant relationships.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function create_correlation_heatmap 52.8% similar

    Generates and saves a correlation heatmap visualizing the relationships between Eimeria infection indicators and performance measures from a pandas DataFrame.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function grouped_correlation_analysis 49.8% similar

    Performs Pearson correlation analysis between Eimeria-related variables and performance variables, grouped by specified categorical variables (e.g., treatment, challenge groups).

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function calculate_cv_v1 47.7% similar

    Calculates the Coefficient of Variation (CV) for a dataset, expressed as a percentage. CV measures relative variability by dividing standard deviation by mean.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_4.py
  • function export_results 47.6% similar

    Exports correlation analysis results to multiple CSV files, including overall correlations, grouped correlations, and significant findings.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
← Back to Browse