Sample size calculation
The calculation of an adequate sample size is crucial to ensuring that you enroll a sufficient number of participants to stand a reasonable chance of arriving at a meaningful result. Specifically, this means enrolling sufficient participants so to be able to confidently accept or reject the null hypothesis of no clinically meaningful difference between the study arms. This is also an ethical consideration, as it may be difficult to justify subjecting a participant to the risks and burdens of a clinical study if there is little chance of a meaningful conclusion.
As it is almost never feasible to study the whole disease population, a sample of participants/subjects is selected from the entire population, which adequately represents the population from which it is drawn, so that inferences about the population as a whole can be made from the study results.
Statistically, the goal of sample size calculation is to minimise the chances of TYPE I ERROR (alpha) and TYPE II ERROR (beta). These errors represent false positives and false negatives, respectively.
There are several factors that needed to be considered when calculating the sample size:
- Type outcome variable (continuous, categorical, ordinal) and the statistical test that will be used to test the primary outcome.
- The desired POWER (also referred to as ‘1-BETA’; usually 80-90%). This refers to the chance of detecting an effect when a true effect exists. E.g. with 80% power, your study has an 80% chance of picking up an effect of the intervention (if such an effect exists).
- The desired false positive rate (also referred to as ‘ALPHA’; usually 5%), i.e. the chance that your study will show an effect of the intervention when no such effect exists. This value is the same as your p-value for statistical significance – usually 0.05.
- The estimated effect size [the difference between the two groups] and the outcome rate or mean value in the control and treatment groups. Estimates of these values are needed for calculating your sample size. They should be realistic and based on existing evidence as far as possible. The smaller the effect size (or difference between groups) you are wanting to detect, the larger the study will need to be.
- The variability in the primary outcome. Most often defined by the STANDARD DEVIATION. The more variation in your study outcome, the larger your sample size will need to be. This variation may be simple ‘noise’ due to usual interindividual differences, but it may also be influenced by your choice of inclusion and exclusion criteria; a more diverse group of patients (diverse in terms of likely to have a wide variety of outcomes or responses to treatment) makes your study more generalizable, potentially at the case of requiring a larger sample size.
Estimating the effect size can be challenging as there are often no previously published studies to indicate what sort of effects to expect (if there was you might not be doing your study!). The most common approach is to use evidence from related studies or to use what you think is a reasonable MINIMAL CLINICALLY IMPORTANT DIFFERENCE. You might want to consider a pilot study to inform your understanding of what to expect.
Keep in mind also the connection between your study inclusion and exclusion criteria, and the fact that study participants are generally fitter and healthier than the overall population. These factors influence the expected rate of study outcomes (e.g. fewer deaths or infections in a healthier study cohort than in the overall population), the possible loss to follow up, and sometimes the response to the intervention (and therefore the study effect size). This can have an important effect on a study’s success, as when a study sees fewer outcomes than expected and so fails to have the power to demonstrate a difference between the intervention and control. Because the effects of these factors can be challenging to predict, many studies rely on data from a pilot study. Another approach is to design the study as ‘event-driven’ – meaning that you plan to continue to recruit participants until the desired number of study outcomes are observed. Adaptive designs are also becoming more common, where the sample size is re-estimated during the study, and then adjusted in light of the observed event rates.
To begin with, if your primary outcome will be tested using a simple statistical test (eg. t-test, Chi-squared, log-rank test) then there are numerous online calculators that can perform the calculation for you. However, if you plan to use more complex analytical methods, or to use a flexible method such as ‘event-driven’ study or adaptive design, then consultation with a statistician is essential.