🔍 Code Extractor

function calculate_cv_v1

Maturity: 45

Calculates the Coefficient of Variation (CV) for a dataset, expressed as a percentage. CV measures relative variability by dividing standard deviation by mean.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_4.py
Lines:
46 - 56
Complexity:
simple

Purpose

This function computes the Coefficient of Variation, a standardized measure of dispersion that allows comparison of variability between datasets with different units or scales. It's commonly used in statistics to assess the relative variability of data, particularly useful when comparing datasets with different means. The function handles missing values by dropping NaN entries and returns NaN for invalid cases (empty data or zero mean).

Source Code

def calculate_cv(data):
    """
    Calculate Coefficient of Variation (CV) = (std / mean) * 100
    Returns CV as percentage
    """
    data_clean = data.dropna()
    if len(data_clean) > 0 and data_clean.mean() != 0:
        cv = (data_clean.std() / data_clean.mean()) * 100
        return cv
    else:
        return np.nan

Parameters

Name Type Default Kind
data - - positional_or_keyword

Parameter Details

data: A pandas Series or array-like object containing numerical data. Should contain numeric values that can be processed by pandas methods (dropna, mean, std). Can include NaN values which will be automatically removed before calculation.

Return Value

Returns a float representing the Coefficient of Variation as a percentage (CV = (standard deviation / mean) * 100). Returns np.nan if the input data is empty after removing NaN values, or if the mean equals zero (to avoid division by zero). Typical values range from 0 to infinity, where lower values indicate less relative variability.

Dependencies

  • pandas
  • numpy

Required Imports

import pandas as pd
import numpy as np

Usage Example

import pandas as pd
import numpy as np

def calculate_cv(data):
    data_clean = data.dropna()
    if len(data_clean) > 0 and data_clean.mean() != 0:
        cv = (data_clean.std() / data_clean.mean()) * 100
        return cv
    else:
        return np.nan

# Example 1: Basic usage with pandas Series
data = pd.Series([10, 12, 15, 11, 13])
cv = calculate_cv(data)
print(f"CV: {cv:.2f}%")  # Output: CV: 14.91%

# Example 2: Data with NaN values
data_with_nan = pd.Series([10, 12, np.nan, 15, 11])
cv = calculate_cv(data_with_nan)
print(f"CV: {cv:.2f}%")  # NaN values are automatically removed

# Example 3: Edge case - zero mean returns NaN
zero_mean_data = pd.Series([-5, 0, 5])
cv = calculate_cv(zero_mean_data)
print(f"CV: {cv}")  # Output: nan

# Example 4: Empty or all-NaN data returns NaN
empty_data = pd.Series([np.nan, np.nan])
cv = calculate_cv(empty_data)
print(f"CV: {cv}")  # Output: nan

Best Practices

  • Always check if the returned value is NaN before using it in further calculations
  • The function assumes input is a pandas Series or compatible array-like object with dropna(), mean(), and std() methods
  • CV is only meaningful for ratio scale data (data with a true zero point) and should not be used with interval scale data
  • Be aware that CV can be misleading when the mean is close to zero, even if not exactly zero
  • For datasets with negative values, CV may not be interpretable as the mean could be near zero or negative
  • The function uses pandas default ddof=1 (sample standard deviation) for std() calculation
  • Consider validating that input data is numeric before passing to this function to avoid unexpected errors

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_cv 95.3% similar

    Calculates the coefficient of variation (CV) for a dataset, expressed as a percentage of the standard deviation relative to the mean.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d48d7789-9627-4e96-9f48-f90b687cd07d/analysis_1.py
  • function calculate_cv_v2 90.5% similar

    Calculates the coefficient of variation (CV) for a group of numerical values, expressed as a percentage.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function correlation_significance 47.7% similar

    Calculates Pearson correlation coefficient and statistical significance (p-value) between two numeric arrays, handling NaN values automatically.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_7.py
  • function main_v54 46.9% similar

    Performs statistical analysis to determine the correlation between antibiotic use frequency and vaccination modes (in-ovo vs non-in-ovo), generating visualizations and saving results to files.

    From: /tf/active/vicechatdev/smartstat/output/b7a013ae-a461-4aca-abae-9ed243119494/analysis_6cdbc6c8/analysis.py
  • function main_v53 46.7% similar

    Performs statistical analysis on antibiotic usage data, comparing distribution patterns between vaccinated and non-vaccinated groups, and generates visualization plots, summary tables, and written conclusions.

    From: /tf/active/vicechatdev/smartstat/output/b7a013ae-a461-4aca-abae-9ed243119494/analysis_70ac0517/analysis.py
← Back to Browse