function calculate_cv
Calculates the coefficient of variation (CV) for a dataset, expressed as a percentage of the standard deviation relative to the mean.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/d48d7789-9627-4e96-9f48-f90b687cd07d/analysis_1.py
42 - 51
simple
Purpose
This function computes the coefficient of variation, a standardized measure of dispersion that allows comparison of variability between datasets with different units or scales. It handles edge cases including missing values, insufficient data points, and zero means. The CV is useful in statistics for comparing the degree of variation from one data series to another, even if the means are drastically different.
Source Code
def calculate_cv(data):
"""Calculate coefficient of variation as (std/mean) * 100"""
data_clean = data.dropna()
if len(data_clean) < 2:
return np.nan
mean_val = data_clean.mean()
std_val = data_clean.std()
if mean_val == 0:
return np.nan
return (std_val / abs(mean_val)) * 100
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
data |
- | - | positional_or_keyword |
Parameter Details
data: A pandas Series or DataFrame column containing numeric data. The function will automatically remove NaN values before calculation. Should contain at least 2 non-null values for a valid CV calculation.
Return Value
Returns a float representing the coefficient of variation as a percentage (standard deviation divided by absolute mean, multiplied by 100). Returns np.nan if: (1) fewer than 2 non-null values exist in the data, or (2) the mean equals zero (to avoid division by zero). The absolute value of the mean is used to ensure positive CV values regardless of whether the data has negative or positive mean.
Dependencies
pandasnumpy
Required Imports
import pandas as pd
import numpy as np
Usage Example
import pandas as pd
import numpy as np
def calculate_cv(data):
"""Calculate coefficient of variation as (std/mean) * 100"""
data_clean = data.dropna()
if len(data_clean) < 2:
return np.nan
mean_val = data_clean.mean()
std_val = data_clean.std()
if mean_val == 0:
return np.nan
return (std_val / abs(mean_val)) * 100
# Example 1: Basic usage with pandas Series
data = pd.Series([10, 12, 15, 11, 13])
cv = calculate_cv(data)
print(f"Coefficient of Variation: {cv:.2f}%")
# Example 2: Data with missing values
data_with_nan = pd.Series([10, np.nan, 15, 11, np.nan, 13])
cv = calculate_cv(data_with_nan)
print(f"CV with NaN handling: {cv:.2f}%")
# Example 3: Insufficient data returns NaN
insufficient_data = pd.Series([10])
cv = calculate_cv(insufficient_data)
print(f"CV with insufficient data: {cv}")
# Example 4: Zero mean returns NaN
zero_mean_data = pd.Series([-5, 0, 5])
cv = calculate_cv(zero_mean_data)
print(f"CV with zero mean: {cv}")
Best Practices
- Always check if the returned value is np.nan before using it in further calculations
- The function uses absolute value of the mean to handle negative means, which may not be appropriate for all use cases - consider if this behavior suits your needs
- Ensure input data is numeric; non-numeric data will cause errors
- The function requires at least 2 data points for a valid calculation; single values or empty series return np.nan
- CV is most meaningful for ratio-scale data (data with a true zero point) and may not be appropriate for interval-scale data
- High CV values (>100%) indicate high variability relative to the mean, while low CV values (<10%) indicate low variability
- Consider the context when interpreting CV: a CV of 20% might be acceptable in some fields but unacceptable in others
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function calculate_cv_v1 95.3% similar
-
function calculate_cv_v2 92.5% similar
-
function correlation_significance 45.8% similar
-
function calculate_correlations 45.7% similar
-
function main_v54 44.9% similar