🔍 Code Extractor

function calculate_sample_size_v1

Maturity: 49

Calculates the required sample size per group for a two-group statistical comparison using Cohen's d effect size, significance level, statistical power, and standard deviation.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/e9b7c942-87b5-4a6f-865e-e7a0d62fb0a1/analysis_2.py
Lines:
133 - 149
Complexity:
moderate

Purpose

This function performs statistical power analysis to determine the minimum number of participants needed in each group of a two-group study design. It uses the standard formula for sample size calculation based on effect size (Cohen's d), desired significance level (alpha), statistical power, and population standard deviation. This is commonly used in experimental design and A/B testing to ensure adequate statistical power before conducting a study.

Source Code

def calculate_sample_size(effect_size, alpha, power, sd):
    """
    Calculate required sample size for two-group comparison
    effect_size: Cohen's d
    alpha: significance level (default 0.05)
    power: statistical power (default 0.80)
    sd: standard deviation
    """
    from scipy.stats import norm
    
    z_alpha = norm.ppf(1 - alpha/2)  # Two-tailed test
    z_beta = norm.ppf(power)
    
    # Sample size per group
    n = 2 * ((z_alpha + z_beta) ** 2) * (sd ** 2) / ((effect_size * sd) ** 2)
    
    return np.ceil(n)

Parameters

Name Type Default Kind
effect_size - - positional_or_keyword
alpha - - positional_or_keyword
power - - positional_or_keyword
sd - - positional_or_keyword

Parameter Details

effect_size: Cohen's d effect size - a standardized measure of the difference between two groups. Typical values: 0.2 (small), 0.5 (medium), 0.8 (large). Must be a positive number greater than 0.

alpha: Significance level (Type I error rate) for the statistical test. Commonly set to 0.05 (5%). Must be between 0 and 1. The function uses a two-tailed test, so this value is divided by 2 internally.

power: Statistical power (1 - Type II error rate) - the probability of detecting an effect if it exists. Commonly set to 0.80 (80%) or 0.90 (90%). Must be between 0 and 1.

sd: Standard deviation of the population or expected standard deviation of the outcome variable. Must be a positive number greater than 0. Should be in the same units as the effect size.

Return Value

Returns a numpy scalar (float64) representing the required sample size per group, rounded up to the nearest whole number using np.ceil(). This is the minimum number of participants needed in each of the two groups to achieve the specified statistical power at the given significance level.

Dependencies

  • scipy
  • numpy

Required Imports

import numpy as np
from scipy.stats import norm

Conditional/Optional Imports

These imports are only needed under specific conditions:

from scipy.stats import norm

Condition: imported inside the function body, always required when function is called

Required (conditional)

Usage Example

import numpy as np
from scipy.stats import norm

def calculate_sample_size(effect_size, alpha, power, sd):
    from scipy.stats import norm
    z_alpha = norm.ppf(1 - alpha/2)
    z_beta = norm.ppf(power)
    n = 2 * ((z_alpha + z_beta) ** 2) * (sd ** 2) / ((effect_size * sd) ** 2)
    return np.ceil(n)

# Example: Calculate sample size for medium effect size
effect_size = 0.5  # Cohen's d (medium effect)
alpha = 0.05  # 5% significance level
power = 0.80  # 80% power
sd = 1.0  # Standard deviation

sample_size = calculate_sample_size(effect_size, alpha, power, sd)
print(f"Required sample size per group: {int(sample_size)}")
# Output: Required sample size per group: 64

Best Practices

  • Ensure that the effect_size and sd parameters are in compatible units and scales
  • The function assumes equal sample sizes in both groups (balanced design)
  • The calculation is based on a two-tailed test; adjust if one-tailed test is needed
  • Always round up the result (which the function does automatically) to ensure adequate power
  • Validate that alpha is typically 0.05 or 0.01, and power is typically 0.80 or 0.90
  • The standard deviation (sd) should be estimated from pilot data or literature when possible
  • Note that the formula simplifies because Cohen's d already incorporates sd, so the sd terms cancel out in the calculation
  • Consider using a slightly higher sample size than calculated to account for potential dropouts or missing data

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_sample_size_v2 92.2% similar

    Calculates the required sample size per group for a two-sample t-test given standard deviation, effect size (Cohen's d), significance level, and statistical power.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f0a78968-1d2b-4fbe-a0c6-a372da2ce2a4/project_1/analysis.py
  • function calculate_sample_size 89.2% similar

    Calculates the required sample size per group for a two-sample t-test using Cohen's d effect size, significance level, and statistical power.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/1315733d-fb14-4740-a1a4-021696492d5e/analysis_1.py
  • function perform_analysis 52.1% similar

    Performs comprehensive statistical analysis on grouped biological/experimental data, including descriptive statistics, correlation analysis, ANOVA testing, and visualization of infection levels and growth performance across different groups.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e1ecec5f-4ea5-49c5-b4f5-d051ce851294/project_1/analysis.py
  • function create_sample_data_v1 48.6% similar

    Generates a synthetic dataset with 200 samples containing group-based measurements, quality scores, environmental data, and temporal information, then saves it to a CSV file.

    From: /tf/active/vicechatdev/full_smartstat/demo.py
  • function calculate_cv_v2 45.3% similar

    Calculates the coefficient of variation (CV) for a group of numerical values, expressed as a percentage.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
← Back to Browse