function calculate_sample_size_v2
Calculates the required sample size per group for a two-sample t-test given standard deviation, effect size (Cohen's d), significance level, and statistical power.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/f0a78968-1d2b-4fbe-a0c6-a372da2ce2a4/project_1/analysis.py
433 - 442
moderate
Purpose
This function performs statistical power analysis to determine the minimum number of samples needed in each group for a two-sample t-test. It uses the standard formula based on normal distribution approximations to calculate sample size, ensuring adequate statistical power to detect a specified effect size at a given significance level. Common use cases include experimental design, A/B testing planning, and clinical trial sizing.
Source Code
def calculate_sample_size(std, effect_size, alpha=0.05, power=0.80):
"""
Calculate sample size per group for two-sample t-test
effect_size: Cohen's d (difference in means / pooled std)
"""
from scipy.stats import norm
z_alpha = norm.ppf(1 - alpha/2)
z_beta = norm.ppf(power)
n = 2 * ((z_alpha + z_beta) ** 2) * (std ** 2) / (effect_size ** 2)
return np.ceil(n)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
std |
- | - | positional_or_keyword |
effect_size |
- | - | positional_or_keyword |
alpha |
- | 0.05 | positional_or_keyword |
power |
- | 0.8 | positional_or_keyword |
Parameter Details
std: Standard deviation of the population or pooled standard deviation. Must be a positive numeric value representing the variability in the data. This should be estimated from pilot studies or prior research.
effect_size: Cohen's d effect size, calculated as the difference in means divided by the pooled standard deviation. Typical values: 0.2 (small effect), 0.5 (medium effect), 0.8 (large effect). Must be a positive numeric value.
alpha: Significance level (Type I error rate) for the statistical test. Default is 0.05 (5%). Must be between 0 and 1. Common values are 0.05, 0.01, or 0.10. Represents the probability of rejecting the null hypothesis when it is true.
power: Statistical power (1 - Type II error rate). Default is 0.80 (80%). Must be between 0 and 1. Represents the probability of correctly rejecting the null hypothesis when the alternative is true. Common values are 0.80 or 0.90.
Return Value
Returns a numpy float64 value representing the required sample size per group, rounded up to the nearest integer using np.ceil(). This is the minimum number of observations needed in each of the two groups to achieve the specified power and significance level for detecting the given effect size.
Dependencies
scipynumpy
Required Imports
import numpy as np
from scipy.stats import norm
Usage Example
import numpy as np
from scipy.stats import norm
def calculate_sample_size(std, effect_size, alpha=0.05, power=0.80):
z_alpha = norm.ppf(1 - alpha/2)
z_beta = norm.ppf(power)
n = 2 * ((z_alpha + z_beta) ** 2) * (std ** 2) / (effect_size ** 2)
return np.ceil(n)
# Example: Calculate sample size for detecting a medium effect
std_dev = 15 # Standard deviation
effect = 0.5 # Cohen's d (medium effect)
sample_size = calculate_sample_size(std_dev, effect, alpha=0.05, power=0.80)
print(f"Required sample size per group: {int(sample_size)}")
# Example: Higher power requirement
sample_size_high_power = calculate_sample_size(std_dev, effect, alpha=0.05, power=0.90)
print(f"Sample size for 90% power: {int(sample_size_high_power)}")
Best Practices
- Ensure the standard deviation (std) is a reasonable estimate from pilot data or literature, as inaccurate estimates will lead to incorrect sample sizes
- The effect_size parameter should be based on practical significance, not just statistical significance. Consider what difference would be meaningful in your domain
- This function assumes equal sample sizes in both groups and equal variances (homoscedasticity)
- The calculation uses normal approximation which is appropriate for larger sample sizes; for very small samples, exact methods may be more appropriate
- Always round up the result (which the function does automatically with np.ceil()) to ensure adequate power
- Consider using more conservative parameters (higher power like 0.90 or lower alpha like 0.01) for critical studies
- The function calculates sample size per group, so multiply by 2 for total sample size in a two-group study
- Validate inputs: std and effect_size should be positive, alpha and power should be between 0 and 1
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function calculate_sample_size 92.9% similar
-
function calculate_sample_size_v1 92.2% similar
-
function perform_analysis 47.9% similar
-
function calculate_cv_v2 45.2% similar
-
function calculate_cv 43.4% similar