🔍 Code Extractor

function calculate_cv

Maturity: 45

Calculates the coefficient of variation (CV) for a dataset, expressed as a percentage of the standard deviation relative to the mean.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/d48d7789-9627-4e96-9f48-f90b687cd07d/analysis_1.py
Lines:
42 - 51
Complexity:
simple

Purpose

This function computes the coefficient of variation, a standardized measure of dispersion that allows comparison of variability between datasets with different units or scales. It handles edge cases including missing values, insufficient data points, and zero means. The CV is useful in statistics for comparing the degree of variation from one data series to another, even if the means are drastically different.

Source Code

def calculate_cv(data):
    """Calculate coefficient of variation as (std/mean) * 100"""
    data_clean = data.dropna()
    if len(data_clean) < 2:
        return np.nan
    mean_val = data_clean.mean()
    std_val = data_clean.std()
    if mean_val == 0:
        return np.nan
    return (std_val / abs(mean_val)) * 100

Parameters

Name Type Default Kind
data - - positional_or_keyword

Parameter Details

data: A pandas Series or DataFrame column containing numeric data. The function will automatically remove NaN values before calculation. Should contain at least 2 non-null values for a valid CV calculation.

Return Value

Returns a float representing the coefficient of variation as a percentage (standard deviation divided by absolute mean, multiplied by 100). Returns np.nan if: (1) fewer than 2 non-null values exist in the data, or (2) the mean equals zero (to avoid division by zero). The absolute value of the mean is used to ensure positive CV values regardless of whether the data has negative or positive mean.

Dependencies

  • pandas
  • numpy

Required Imports

import pandas as pd
import numpy as np

Usage Example

import pandas as pd
import numpy as np

def calculate_cv(data):
    """Calculate coefficient of variation as (std/mean) * 100"""
    data_clean = data.dropna()
    if len(data_clean) < 2:
        return np.nan
    mean_val = data_clean.mean()
    std_val = data_clean.std()
    if mean_val == 0:
        return np.nan
    return (std_val / abs(mean_val)) * 100

# Example 1: Basic usage with pandas Series
data = pd.Series([10, 12, 15, 11, 13])
cv = calculate_cv(data)
print(f"Coefficient of Variation: {cv:.2f}%")

# Example 2: Data with missing values
data_with_nan = pd.Series([10, np.nan, 15, 11, np.nan, 13])
cv = calculate_cv(data_with_nan)
print(f"CV with NaN handling: {cv:.2f}%")

# Example 3: Insufficient data returns NaN
insufficient_data = pd.Series([10])
cv = calculate_cv(insufficient_data)
print(f"CV with insufficient data: {cv}")

# Example 4: Zero mean returns NaN
zero_mean_data = pd.Series([-5, 0, 5])
cv = calculate_cv(zero_mean_data)
print(f"CV with zero mean: {cv}")

Best Practices

  • Always check if the returned value is np.nan before using it in further calculations
  • The function uses absolute value of the mean to handle negative means, which may not be appropriate for all use cases - consider if this behavior suits your needs
  • Ensure input data is numeric; non-numeric data will cause errors
  • The function requires at least 2 data points for a valid calculation; single values or empty series return np.nan
  • CV is most meaningful for ratio-scale data (data with a true zero point) and may not be appropriate for interval-scale data
  • High CV values (>100%) indicate high variability relative to the mean, while low CV values (<10%) indicate low variability
  • Consider the context when interpreting CV: a CV of 20% might be acceptable in some fields but unacceptable in others

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_cv_v1 95.3% similar

    Calculates the Coefficient of Variation (CV) for a dataset, expressed as a percentage. CV measures relative variability by dividing standard deviation by mean.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_4.py
  • function calculate_cv_v2 92.5% similar

    Calculates the coefficient of variation (CV) for a group of numerical values, expressed as a percentage.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function correlation_significance 45.8% similar

    Calculates Pearson correlation coefficient and statistical significance (p-value) between two numeric arrays, handling NaN values automatically.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_7.py
  • function calculate_correlations 45.7% similar

    Calculates both Pearson and Spearman correlation coefficients between Eimeria variables and performance variables, filtering out missing values and identifying statistically significant relationships.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function main_v54 44.9% similar

    Performs statistical analysis to determine the correlation between antibiotic use frequency and vaccination modes (in-ovo vs non-in-ovo), generating visualizations and saving results to files.

    From: /tf/active/vicechatdev/smartstat/output/b7a013ae-a461-4aca-abae-9ed243119494/analysis_6cdbc6c8/analysis.py
← Back to Browse