function calculate_cv_v1
Calculates the Coefficient of Variation (CV) for a dataset, expressed as a percentage. CV measures relative variability by dividing standard deviation by mean.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_4.py
46 - 56
simple
Purpose
This function computes the Coefficient of Variation, a standardized measure of dispersion that allows comparison of variability between datasets with different units or scales. It's commonly used in statistics to assess the relative variability of data, particularly useful when comparing datasets with different means. The function handles missing values by dropping NaN entries and returns NaN for invalid cases (empty data or zero mean).
Source Code
def calculate_cv(data):
"""
Calculate Coefficient of Variation (CV) = (std / mean) * 100
Returns CV as percentage
"""
data_clean = data.dropna()
if len(data_clean) > 0 and data_clean.mean() != 0:
cv = (data_clean.std() / data_clean.mean()) * 100
return cv
else:
return np.nan
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
data |
- | - | positional_or_keyword |
Parameter Details
data: A pandas Series or array-like object containing numerical data. Should contain numeric values that can be processed by pandas methods (dropna, mean, std). Can include NaN values which will be automatically removed before calculation.
Return Value
Returns a float representing the Coefficient of Variation as a percentage (CV = (standard deviation / mean) * 100). Returns np.nan if the input data is empty after removing NaN values, or if the mean equals zero (to avoid division by zero). Typical values range from 0 to infinity, where lower values indicate less relative variability.
Dependencies
pandasnumpy
Required Imports
import pandas as pd
import numpy as np
Usage Example
import pandas as pd
import numpy as np
def calculate_cv(data):
data_clean = data.dropna()
if len(data_clean) > 0 and data_clean.mean() != 0:
cv = (data_clean.std() / data_clean.mean()) * 100
return cv
else:
return np.nan
# Example 1: Basic usage with pandas Series
data = pd.Series([10, 12, 15, 11, 13])
cv = calculate_cv(data)
print(f"CV: {cv:.2f}%") # Output: CV: 14.91%
# Example 2: Data with NaN values
data_with_nan = pd.Series([10, 12, np.nan, 15, 11])
cv = calculate_cv(data_with_nan)
print(f"CV: {cv:.2f}%") # NaN values are automatically removed
# Example 3: Edge case - zero mean returns NaN
zero_mean_data = pd.Series([-5, 0, 5])
cv = calculate_cv(zero_mean_data)
print(f"CV: {cv}") # Output: nan
# Example 4: Empty or all-NaN data returns NaN
empty_data = pd.Series([np.nan, np.nan])
cv = calculate_cv(empty_data)
print(f"CV: {cv}") # Output: nan
Best Practices
- Always check if the returned value is NaN before using it in further calculations
- The function assumes input is a pandas Series or compatible array-like object with dropna(), mean(), and std() methods
- CV is only meaningful for ratio scale data (data with a true zero point) and should not be used with interval scale data
- Be aware that CV can be misleading when the mean is close to zero, even if not exactly zero
- For datasets with negative values, CV may not be interpretable as the mean could be near zero or negative
- The function uses pandas default ddof=1 (sample standard deviation) for std() calculation
- Consider validating that input data is numeric before passing to this function to avoid unexpected errors
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function calculate_cv 95.3% similar
-
function calculate_cv_v2 90.5% similar
-
function correlation_significance 47.7% similar
-
function main_v54 46.9% similar
-
function main_v53 46.7% similar