function calculate_sample_size
Calculates the required sample size per group for a two-sample t-test using Cohen's d effect size, significance level, and statistical power.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/1315733d-fb14-4740-a1a4-021696492d5e/analysis_1.py
43 - 65
moderate
Purpose
This function performs statistical power analysis to determine the minimum number of samples needed in each group for a two-sample t-test. It's commonly used in experimental design and A/B testing to ensure studies have adequate statistical power to detect meaningful effects. The calculation uses the standard formula incorporating z-scores for the specified alpha level (Type I error rate) and power (1 - Type II error rate).
Source Code
def calculate_sample_size(std_dev, effect_size_cohen, alpha=0.05, power=0.80):
"""
Calculate required sample size per group for a two-sample t-test
Parameters:
- std_dev: pooled standard deviation
- effect_size_cohen: Cohen's d (small=0.2, medium=0.5, large=0.8)
- alpha: significance level (default 0.05)
- power: statistical power (default 0.80)
Returns:
- Required sample size per group
"""
from scipy.stats import norm
# Z-scores for alpha and power
z_alpha = norm.ppf(1 - alpha/2) # Two-tailed test
z_beta = norm.ppf(power)
# Sample size calculation formula for two-sample t-test
n = 2 * ((z_alpha + z_beta) ** 2) * (std_dev ** 2) / ((effect_size_cohen * std_dev) ** 2)
return int(np.ceil(n))
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
std_dev |
- | - | positional_or_keyword |
effect_size_cohen |
- | - | positional_or_keyword |
alpha |
- | 0.05 | positional_or_keyword |
power |
- | 0.8 | positional_or_keyword |
Parameter Details
std_dev: The pooled standard deviation of the two groups being compared. Must be a positive numeric value representing the expected variability in the data. This is typically estimated from pilot studies or literature.
effect_size_cohen: Cohen's d effect size, representing the standardized difference between two means. Common benchmarks: 0.2 (small effect), 0.5 (medium effect), 0.8 (large effect). Must be a positive numeric value. Larger effect sizes require smaller sample sizes to detect.
alpha: The significance level (Type I error rate) for the hypothesis test. Default is 0.05 (5% chance of false positive). Must be between 0 and 1. Common values are 0.05, 0.01, or 0.10. Lower alpha values require larger sample sizes.
power: The statistical power (1 - Type II error rate), representing the probability of detecting a true effect. Default is 0.80 (80% power). Must be between 0 and 1. Common values are 0.80, 0.90, or 0.95. Higher power requires larger sample sizes.
Return Value
Returns an integer representing the required sample size per group (not total sample size). The value is always rounded up using ceiling function to ensure adequate power. For example, if the calculation yields 63.2, the function returns 64. The total sample size for the study would be twice this value (one for each group).
Dependencies
scipynumpy
Required Imports
import numpy as np
from scipy.stats import norm
Conditional/Optional Imports
These imports are only needed under specific conditions:
from scipy.stats import norm
Condition: imported inside the function body, always required when function is called
Required (conditional)Usage Example
import numpy as np
from scipy.stats import norm
def calculate_sample_size(std_dev, effect_size_cohen, alpha=0.05, power=0.80):
from scipy.stats import norm
z_alpha = norm.ppf(1 - alpha/2)
z_beta = norm.ppf(power)
n = 2 * ((z_alpha + z_beta) ** 2) * (std_dev ** 2) / ((effect_size_cohen * std_dev) ** 2)
return int(np.ceil(n))
# Example 1: Calculate sample size for medium effect
std_deviation = 15
effect_size = 0.5 # medium effect
sample_size = calculate_sample_size(std_deviation, effect_size)
print(f"Required sample size per group: {sample_size}")
# Output: Required sample size per group: 64
# Example 2: Higher power requirement
sample_size_high_power = calculate_sample_size(std_dev=10, effect_size_cohen=0.3, alpha=0.05, power=0.90)
print(f"Sample size with 90% power: {sample_size_high_power}")
# Example 3: Small effect size detection
sample_size_small = calculate_sample_size(std_dev=20, effect_size_cohen=0.2, alpha=0.01, power=0.80)
print(f"Sample size for small effect: {sample_size_small}")
Best Practices
- Ensure std_dev is positive and represents a realistic estimate of population variability
- Choose effect_size_cohen based on domain knowledge or pilot studies; don't default to arbitrary values
- Remember the returned value is per group - multiply by 2 for total sample size in a two-group study
- Consider using more conservative (higher) power values (0.90 or 0.95) for critical studies
- The formula assumes equal sample sizes in both groups and normally distributed data
- For very small effect sizes or high power requirements, sample sizes can become impractically large
- Validate that the calculated sample size is feasible given budget and time constraints
- The function uses a two-tailed test assumption (alpha/2); modify if one-tailed test is needed
- Consider conducting sensitivity analysis by varying parameters to understand robustness of sample size estimate
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function calculate_sample_size_v2 92.9% similar
-
function calculate_sample_size_v1 89.2% similar
-
function perform_analysis 45.0% similar
-
function calculate_cv_v2 43.4% similar
-
function calculate_cv 42.5% similar