🔍 Code Extractor

function calculate_sample_size_v2

Maturity: 48

Calculates the required sample size per group for a two-sample t-test given standard deviation, effect size (Cohen's d), significance level, and statistical power.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/f0a78968-1d2b-4fbe-a0c6-a372da2ce2a4/project_1/analysis.py
Lines:
433 - 442
Complexity:
moderate

Purpose

This function performs statistical power analysis to determine the minimum number of samples needed in each group for a two-sample t-test. It uses the standard formula based on normal distribution approximations to calculate sample size, ensuring adequate statistical power to detect a specified effect size at a given significance level. Common use cases include experimental design, A/B testing planning, and clinical trial sizing.

Source Code

def calculate_sample_size(std, effect_size, alpha=0.05, power=0.80):
    """
    Calculate sample size per group for two-sample t-test
    effect_size: Cohen's d (difference in means / pooled std)
    """
    from scipy.stats import norm
    z_alpha = norm.ppf(1 - alpha/2)
    z_beta = norm.ppf(power)
    n = 2 * ((z_alpha + z_beta) ** 2) * (std ** 2) / (effect_size ** 2)
    return np.ceil(n)

Parameters

Name Type Default Kind
std - - positional_or_keyword
effect_size - - positional_or_keyword
alpha - 0.05 positional_or_keyword
power - 0.8 positional_or_keyword

Parameter Details

std: Standard deviation of the population or pooled standard deviation. Must be a positive numeric value representing the variability in the data. This should be estimated from pilot studies or prior research.

effect_size: Cohen's d effect size, calculated as the difference in means divided by the pooled standard deviation. Typical values: 0.2 (small effect), 0.5 (medium effect), 0.8 (large effect). Must be a positive numeric value.

alpha: Significance level (Type I error rate) for the statistical test. Default is 0.05 (5%). Must be between 0 and 1. Common values are 0.05, 0.01, or 0.10. Represents the probability of rejecting the null hypothesis when it is true.

power: Statistical power (1 - Type II error rate). Default is 0.80 (80%). Must be between 0 and 1. Represents the probability of correctly rejecting the null hypothesis when the alternative is true. Common values are 0.80 or 0.90.

Return Value

Returns a numpy float64 value representing the required sample size per group, rounded up to the nearest integer using np.ceil(). This is the minimum number of observations needed in each of the two groups to achieve the specified power and significance level for detecting the given effect size.

Dependencies

  • scipy
  • numpy

Required Imports

import numpy as np
from scipy.stats import norm

Usage Example

import numpy as np
from scipy.stats import norm

def calculate_sample_size(std, effect_size, alpha=0.05, power=0.80):
    z_alpha = norm.ppf(1 - alpha/2)
    z_beta = norm.ppf(power)
    n = 2 * ((z_alpha + z_beta) ** 2) * (std ** 2) / (effect_size ** 2)
    return np.ceil(n)

# Example: Calculate sample size for detecting a medium effect
std_dev = 15  # Standard deviation
effect = 0.5  # Cohen's d (medium effect)
sample_size = calculate_sample_size(std_dev, effect, alpha=0.05, power=0.80)
print(f"Required sample size per group: {int(sample_size)}")

# Example: Higher power requirement
sample_size_high_power = calculate_sample_size(std_dev, effect, alpha=0.05, power=0.90)
print(f"Sample size for 90% power: {int(sample_size_high_power)}")

Best Practices

  • Ensure the standard deviation (std) is a reasonable estimate from pilot data or literature, as inaccurate estimates will lead to incorrect sample sizes
  • The effect_size parameter should be based on practical significance, not just statistical significance. Consider what difference would be meaningful in your domain
  • This function assumes equal sample sizes in both groups and equal variances (homoscedasticity)
  • The calculation uses normal approximation which is appropriate for larger sample sizes; for very small samples, exact methods may be more appropriate
  • Always round up the result (which the function does automatically with np.ceil()) to ensure adequate power
  • Consider using more conservative parameters (higher power like 0.90 or lower alpha like 0.01) for critical studies
  • The function calculates sample size per group, so multiply by 2 for total sample size in a two-group study
  • Validate inputs: std and effect_size should be positive, alpha and power should be between 0 and 1

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_sample_size 92.9% similar

    Calculates the required sample size per group for a two-sample t-test using Cohen's d effect size, significance level, and statistical power.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/1315733d-fb14-4740-a1a4-021696492d5e/analysis_1.py
  • function calculate_sample_size_v1 92.2% similar

    Calculates the required sample size per group for a two-group statistical comparison using Cohen's d effect size, significance level, statistical power, and standard deviation.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e9b7c942-87b5-4a6f-865e-e7a0d62fb0a1/analysis_2.py
  • function perform_analysis 47.9% similar

    Performs comprehensive statistical analysis on grouped biological/experimental data, including descriptive statistics, correlation analysis, ANOVA testing, and visualization of infection levels and growth performance across different groups.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e1ecec5f-4ea5-49c5-b4f5-d051ce851294/project_1/analysis.py
  • function calculate_cv_v2 45.2% similar

    Calculates the coefficient of variation (CV) for a group of numerical values, expressed as a percentage.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function calculate_cv 43.4% similar

    Calculates the coefficient of variation (CV) for a dataset, expressed as a percentage of the standard deviation relative to the mean.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d48d7789-9627-4e96-9f48-f90b687cd07d/analysis_1.py
← Back to Browse