🔍 Code Extractor

function calculate_sample_size

Maturity: 54

Calculates the required sample size per group for a two-sample t-test using Cohen's d effect size, significance level, and statistical power.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/1315733d-fb14-4740-a1a4-021696492d5e/analysis_1.py
Lines:
43 - 65
Complexity:
moderate

Purpose

This function performs statistical power analysis to determine the minimum number of samples needed in each group for a two-sample t-test. It's commonly used in experimental design and A/B testing to ensure studies have adequate statistical power to detect meaningful effects. The calculation uses the standard formula incorporating z-scores for the specified alpha level (Type I error rate) and power (1 - Type II error rate).

Source Code

def calculate_sample_size(std_dev, effect_size_cohen, alpha=0.05, power=0.80):
    """
    Calculate required sample size per group for a two-sample t-test
    
    Parameters:
    - std_dev: pooled standard deviation
    - effect_size_cohen: Cohen's d (small=0.2, medium=0.5, large=0.8)
    - alpha: significance level (default 0.05)
    - power: statistical power (default 0.80)
    
    Returns:
    - Required sample size per group
    """
    from scipy.stats import norm
    
    # Z-scores for alpha and power
    z_alpha = norm.ppf(1 - alpha/2)  # Two-tailed test
    z_beta = norm.ppf(power)
    
    # Sample size calculation formula for two-sample t-test
    n = 2 * ((z_alpha + z_beta) ** 2) * (std_dev ** 2) / ((effect_size_cohen * std_dev) ** 2)
    
    return int(np.ceil(n))

Parameters

Name Type Default Kind
std_dev - - positional_or_keyword
effect_size_cohen - - positional_or_keyword
alpha - 0.05 positional_or_keyword
power - 0.8 positional_or_keyword

Parameter Details

std_dev: The pooled standard deviation of the two groups being compared. Must be a positive numeric value representing the expected variability in the data. This is typically estimated from pilot studies or literature.

effect_size_cohen: Cohen's d effect size, representing the standardized difference between two means. Common benchmarks: 0.2 (small effect), 0.5 (medium effect), 0.8 (large effect). Must be a positive numeric value. Larger effect sizes require smaller sample sizes to detect.

alpha: The significance level (Type I error rate) for the hypothesis test. Default is 0.05 (5% chance of false positive). Must be between 0 and 1. Common values are 0.05, 0.01, or 0.10. Lower alpha values require larger sample sizes.

power: The statistical power (1 - Type II error rate), representing the probability of detecting a true effect. Default is 0.80 (80% power). Must be between 0 and 1. Common values are 0.80, 0.90, or 0.95. Higher power requires larger sample sizes.

Return Value

Returns an integer representing the required sample size per group (not total sample size). The value is always rounded up using ceiling function to ensure adequate power. For example, if the calculation yields 63.2, the function returns 64. The total sample size for the study would be twice this value (one for each group).

Dependencies

  • scipy
  • numpy

Required Imports

import numpy as np
from scipy.stats import norm

Conditional/Optional Imports

These imports are only needed under specific conditions:

from scipy.stats import norm

Condition: imported inside the function body, always required when function is called

Required (conditional)

Usage Example

import numpy as np
from scipy.stats import norm

def calculate_sample_size(std_dev, effect_size_cohen, alpha=0.05, power=0.80):
    from scipy.stats import norm
    z_alpha = norm.ppf(1 - alpha/2)
    z_beta = norm.ppf(power)
    n = 2 * ((z_alpha + z_beta) ** 2) * (std_dev ** 2) / ((effect_size_cohen * std_dev) ** 2)
    return int(np.ceil(n))

# Example 1: Calculate sample size for medium effect
std_deviation = 15
effect_size = 0.5  # medium effect
sample_size = calculate_sample_size(std_deviation, effect_size)
print(f"Required sample size per group: {sample_size}")
# Output: Required sample size per group: 64

# Example 2: Higher power requirement
sample_size_high_power = calculate_sample_size(std_dev=10, effect_size_cohen=0.3, alpha=0.05, power=0.90)
print(f"Sample size with 90% power: {sample_size_high_power}")

# Example 3: Small effect size detection
sample_size_small = calculate_sample_size(std_dev=20, effect_size_cohen=0.2, alpha=0.01, power=0.80)
print(f"Sample size for small effect: {sample_size_small}")

Best Practices

  • Ensure std_dev is positive and represents a realistic estimate of population variability
  • Choose effect_size_cohen based on domain knowledge or pilot studies; don't default to arbitrary values
  • Remember the returned value is per group - multiply by 2 for total sample size in a two-group study
  • Consider using more conservative (higher) power values (0.90 or 0.95) for critical studies
  • The formula assumes equal sample sizes in both groups and normally distributed data
  • For very small effect sizes or high power requirements, sample sizes can become impractically large
  • Validate that the calculated sample size is feasible given budget and time constraints
  • The function uses a two-tailed test assumption (alpha/2); modify if one-tailed test is needed
  • Consider conducting sensitivity analysis by varying parameters to understand robustness of sample size estimate

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_sample_size_v2 92.9% similar

    Calculates the required sample size per group for a two-sample t-test given standard deviation, effect size (Cohen's d), significance level, and statistical power.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f0a78968-1d2b-4fbe-a0c6-a372da2ce2a4/project_1/analysis.py
  • function calculate_sample_size_v1 89.2% similar

    Calculates the required sample size per group for a two-group statistical comparison using Cohen's d effect size, significance level, statistical power, and standard deviation.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e9b7c942-87b5-4a6f-865e-e7a0d62fb0a1/analysis_2.py
  • function perform_analysis 45.0% similar

    Performs comprehensive statistical analysis on grouped biological/experimental data, including descriptive statistics, correlation analysis, ANOVA testing, and visualization of infection levels and growth performance across different groups.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e1ecec5f-4ea5-49c5-b4f5-d051ce851294/project_1/analysis.py
  • function calculate_cv_v2 43.4% similar

    Calculates the coefficient of variation (CV) for a group of numerical values, expressed as a percentage.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function calculate_cv 42.5% similar

    Calculates the coefficient of variation (CV) for a dataset, expressed as a percentage of the standard deviation relative to the mean.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d48d7789-9627-4e96-9f48-f90b687cd07d/analysis_1.py
← Back to Browse