correlation_significance - Code Extractor

function correlation_significance

Maturity: 44

Calculates Pearson correlation coefficient and statistical significance (p-value) between two numeric arrays, handling NaN values automatically.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_7.py

Lines:
348 - 359

Complexity:
simple

Purpose

This function computes the Pearson correlation coefficient to measure linear relationship strength between two variables, along with the p-value to assess statistical significance. It automatically filters out NaN values from both arrays and requires at least 3 valid data points to perform the calculation. Returns None values if insufficient data is available.

Source Code

def correlation_significance(x, y):
    """Calculate correlation and p-value"""
    # Remove NaN values
    mask = ~(np.isnan(x) | np.isnan(y))
    x_clean = x[mask]
    y_clean = y[mask]
    
    if len(x_clean) < 3:
        return None, None, 0
    
    corr, p_value = stats.pearsonr(x_clean, y_clean)
    return corr, p_value, len(x_clean)

Parameters

Name	Type	Default	Kind
`x`	-	-	positional_or_keyword
`y`	-	-	positional_or_keyword

Parameter Details

x: First numeric array or array-like object (list, numpy array, pandas Series). Can contain NaN values which will be automatically removed. Should be numeric data representing one variable in the correlation analysis.

y: Second numeric array or array-like object (list, numpy array, pandas Series). Must have the same length as x. Can contain NaN values which will be automatically removed. Should be numeric data representing the second variable in the correlation analysis.

Return Value

Returns a tuple of three values: (corr, p_value, n_samples). 'corr' is the Pearson correlation coefficient (float between -1 and 1, or None if insufficient data). 'p_value' is the two-tailed p-value for testing non-correlation (float between 0 and 1, or None if insufficient data). 'n_samples' is the number of valid (non-NaN) paired observations used in the calculation (integer, minimum 0).

Dependencies

numpy
scipy

Required Imports

import numpy as np
from scipy import stats

Usage Example

import numpy as np
from scipy import stats

def correlation_significance(x, y):
    mask = ~(np.isnan(x) | np.isnan(y))
    x_clean = x[mask]
    y_clean = y[mask]
    if len(x_clean) < 3:
        return None, None, 0
    corr, p_value = stats.pearsonr(x_clean, y_clean)
    return corr, p_value, len(x_clean)

# Example usage
x = np.array([1, 2, 3, 4, 5, np.nan, 7])
y = np.array([2, 4, 5, 4, 5, 6, np.nan])
corr, p_val, n = correlation_significance(x, y)
print(f"Correlation: {corr:.3f}, P-value: {p_val:.3f}, N: {n}")
# Output: Correlation: 0.500, P-value: 0.391, N: 5

Best Practices

Ensure both input arrays have the same length before calling the function
The function requires at least 3 valid (non-NaN) paired observations to calculate correlation; otherwise it returns (None, None, 0)
NaN values are automatically removed pairwise - if either x[i] or y[i] is NaN, both values at index i are excluded
The p-value tests the null hypothesis that there is no linear correlation between the variables
Pearson correlation assumes linear relationships and is sensitive to outliers
Consider checking the returned n_samples value to ensure sufficient data was available for meaningful analysis
For non-linear relationships, consider using Spearman or Kendall correlation instead

Similar Components

AI-powered semantic similarity - components with related functionality:

function calculate_correlations 64.4% similar

Calculates both Pearson and Spearman correlation coefficients between Eimeria variables and performance variables, filtering out missing values and identifying statistically significant relationships.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
function create_correlation_heatmap 52.8% similar

Generates and saves a correlation heatmap visualizing the relationships between Eimeria infection indicators and performance measures from a pandas DataFrame.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
function grouped_correlation_analysis 49.8% similar

Performs Pearson correlation analysis between Eimeria-related variables and performance variables, grouped by specified categorical variables (e.g., treatment, challenge groups).
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
function calculate_cv_v1 47.7% similar

Calculates the Coefficient of Variation (CV) for a dataset, expressed as a percentage. CV measures relative variability by dividing standard deviation by mean.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_4.py
function export_results 47.6% similar

Exports correlation analysis results to multiple CSV files, including overall correlations, grouped correlations, and significant findings.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py

🔍 Code Extractor

function correlation_significance

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function calculate_correlations 64.4% similar

function create_correlation_heatmap 52.8% similar

function grouped_correlation_analysis 49.8% similar

function calculate_cv_v1 47.7% similar

function export_results 47.6% similar

function correlation_significance

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function calculate_correlations 64.4% similar

function create_correlation_heatmap 52.8% similar

function grouped_correlation_analysis 49.8% similar

function calculate_cv_v1 47.7% similar

function export_results 47.6% similar

✨ Improve Code: correlation_significance

Code Comparison