function calculate_sample_size_v1
Calculates the required sample size per group for a two-group statistical comparison using Cohen's d effect size, significance level, statistical power, and standard deviation.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/e9b7c942-87b5-4a6f-865e-e7a0d62fb0a1/analysis_2.py
133 - 149
moderate
Purpose
This function performs statistical power analysis to determine the minimum number of participants needed in each group of a two-group study design. It uses the standard formula for sample size calculation based on effect size (Cohen's d), desired significance level (alpha), statistical power, and population standard deviation. This is commonly used in experimental design and A/B testing to ensure adequate statistical power before conducting a study.
Source Code
def calculate_sample_size(effect_size, alpha, power, sd):
"""
Calculate required sample size for two-group comparison
effect_size: Cohen's d
alpha: significance level (default 0.05)
power: statistical power (default 0.80)
sd: standard deviation
"""
from scipy.stats import norm
z_alpha = norm.ppf(1 - alpha/2) # Two-tailed test
z_beta = norm.ppf(power)
# Sample size per group
n = 2 * ((z_alpha + z_beta) ** 2) * (sd ** 2) / ((effect_size * sd) ** 2)
return np.ceil(n)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
effect_size |
- | - | positional_or_keyword |
alpha |
- | - | positional_or_keyword |
power |
- | - | positional_or_keyword |
sd |
- | - | positional_or_keyword |
Parameter Details
effect_size: Cohen's d effect size - a standardized measure of the difference between two groups. Typical values: 0.2 (small), 0.5 (medium), 0.8 (large). Must be a positive number greater than 0.
alpha: Significance level (Type I error rate) for the statistical test. Commonly set to 0.05 (5%). Must be between 0 and 1. The function uses a two-tailed test, so this value is divided by 2 internally.
power: Statistical power (1 - Type II error rate) - the probability of detecting an effect if it exists. Commonly set to 0.80 (80%) or 0.90 (90%). Must be between 0 and 1.
sd: Standard deviation of the population or expected standard deviation of the outcome variable. Must be a positive number greater than 0. Should be in the same units as the effect size.
Return Value
Returns a numpy scalar (float64) representing the required sample size per group, rounded up to the nearest whole number using np.ceil(). This is the minimum number of participants needed in each of the two groups to achieve the specified statistical power at the given significance level.
Dependencies
scipynumpy
Required Imports
import numpy as np
from scipy.stats import norm
Conditional/Optional Imports
These imports are only needed under specific conditions:
from scipy.stats import norm
Condition: imported inside the function body, always required when function is called
Required (conditional)Usage Example
import numpy as np
from scipy.stats import norm
def calculate_sample_size(effect_size, alpha, power, sd):
from scipy.stats import norm
z_alpha = norm.ppf(1 - alpha/2)
z_beta = norm.ppf(power)
n = 2 * ((z_alpha + z_beta) ** 2) * (sd ** 2) / ((effect_size * sd) ** 2)
return np.ceil(n)
# Example: Calculate sample size for medium effect size
effect_size = 0.5 # Cohen's d (medium effect)
alpha = 0.05 # 5% significance level
power = 0.80 # 80% power
sd = 1.0 # Standard deviation
sample_size = calculate_sample_size(effect_size, alpha, power, sd)
print(f"Required sample size per group: {int(sample_size)}")
# Output: Required sample size per group: 64
Best Practices
- Ensure that the effect_size and sd parameters are in compatible units and scales
- The function assumes equal sample sizes in both groups (balanced design)
- The calculation is based on a two-tailed test; adjust if one-tailed test is needed
- Always round up the result (which the function does automatically) to ensure adequate power
- Validate that alpha is typically 0.05 or 0.01, and power is typically 0.80 or 0.90
- The standard deviation (sd) should be estimated from pilot data or literature when possible
- Note that the formula simplifies because Cohen's d already incorporates sd, so the sd terms cancel out in the calculation
- Consider using a slightly higher sample size than calculated to account for potential dropouts or missing data
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function calculate_sample_size_v2 92.2% similar
-
function calculate_sample_size 89.2% similar
-
function perform_analysis 52.1% similar
-
function create_sample_data_v1 48.6% similar
-
function calculate_cv_v2 45.3% similar