🔍 Code Extractor

function calculate_cv_v2

Maturity: 26

Calculates the coefficient of variation (CV) for a group of numerical values, expressed as a percentage.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
Lines:
42 - 43
Complexity:
simple

Purpose

This function computes the coefficient of variation, which is a standardized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation to the mean, multiplied by 100 to express it as a percentage. This metric is useful for comparing the degree of variation between datasets with different units or vastly different means. Commonly used in statistical analysis, quality control, and data science workflows, particularly with pandas groupby operations.

Source Code

def calculate_cv(group):
    return group.std() / group.mean() * 100

Parameters

Name Type Default Kind
group - - positional_or_keyword

Parameter Details

group: A pandas Series or array-like object containing numerical values. This parameter is typically a grouped subset of data when used with pandas groupby operations. The group must contain numeric data and should have at least 2 values for meaningful CV calculation. Values should not all be identical (to avoid division issues) and the mean should not be zero (to avoid division by zero).

Return Value

Returns a float or pandas Series representing the coefficient of variation as a percentage. The value indicates the relative variability: higher CV values indicate greater variability relative to the mean, while lower values indicate less variability. Returns NaN if the mean is zero or if the group contains insufficient data. When used with pandas groupby, returns a Series with the CV for each group.

Dependencies

  • pandas
  • numpy

Required Imports

import pandas as pd
import numpy as np

Usage Example

import pandas as pd
import numpy as np

def calculate_cv(group):
    return group.std() / group.mean() * 100

# Example 1: Calculate CV for a simple array
data = pd.Series([10, 12, 15, 11, 13])
cv = calculate_cv(data)
print(f"CV: {cv:.2f}%")

# Example 2: Use with pandas groupby
df = pd.DataFrame({
    'category': ['A', 'A', 'A', 'B', 'B', 'B'],
    'values': [10, 12, 11, 20, 25, 22]
})
cv_by_category = df.groupby('category')['values'].apply(calculate_cv)
print(cv_by_category)

Best Practices

  • Ensure the input group contains numeric data only; non-numeric values will cause errors
  • Check for zero means before applying this function to avoid division by zero errors
  • Be aware that CV is undefined for data with a mean of zero and can be misleading for data with means close to zero
  • CV is most meaningful for ratio scale data (data with a true zero point) and may not be appropriate for interval scale data
  • When using with groupby, ensure groups have sufficient data points (at least 2) for meaningful standard deviation calculation
  • Consider handling NaN values in the input data before applying this function using dropna() or fillna()
  • CV values above 100% indicate high variability, while values below 30% typically indicate low variability
  • This function assumes the data follows a distribution where mean and standard deviation are appropriate measures

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_cv 92.5% similar

    Calculates the coefficient of variation (CV) for a dataset, expressed as a percentage of the standard deviation relative to the mean.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d48d7789-9627-4e96-9f48-f90b687cd07d/analysis_1.py
  • function calculate_cv_v1 90.5% similar

    Calculates the Coefficient of Variation (CV) for a dataset, expressed as a percentage. CV measures relative variability by dividing standard deviation by mean.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_4.py
  • function calculate_sample_size_v1 45.3% similar

    Calculates the required sample size per group for a two-group statistical comparison using Cohen's d effect size, significance level, statistical power, and standard deviation.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e9b7c942-87b5-4a6f-865e-e7a0d62fb0a1/analysis_2.py
  • function calculate_sample_size_v2 45.2% similar

    Calculates the required sample size per group for a two-sample t-test given standard deviation, effect size (Cohen's d), significance level, and statistical power.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f0a78968-1d2b-4fbe-a0c6-a372da2ce2a4/project_1/analysis.py
  • function main_v56 45.1% similar

    Performs statistical analysis to determine the correlation between antibiotic use frequency and vaccination modes (in-ovo vs non-in-ovo), generating visualizations and saving results to files.

    From: /tf/active/vicechatdev/smartstat/output/b7a013ae-a461-4aca-abae-9ed243119494/analysis_6cdbc6c8/analysis.py
← Back to Browse