A/B test reliability calculator
Calculate sample size
This calculator will help you determine how many people you’ll need for a reliable result. Use it when testing open rates, click rates, conversion rates, and more.
number of people
Current baseline — this is the current average value of the metric (e.g conversion rate) that you’re looking to improve. You can calculate this using data you’ve previously collected.
Expected uplift — this is the minimum growth that you’d like to achieve in the metric that you’re testing. For example, you may want your current baseline conversion to increase by 3%. If your expected uplift is too low (e.g. 0.1%), you’ll need a lot more people for your test. If this value is too high (e.g. 15%) and your actual uplift turns out to be lower, this will mean that the test didn’t result in the growth desired and the results are not statistically significant.
Significance level — also known as ’confidence level’. This is how certain you can be that the test results are valid if there is a difference between the variants.
Statistical power — this is the level of certainty you can have about the test results being valid if there is no difference between variables. If you’re not sure about this step, keep the default value.
Sample size — this will show you how many people need to be exposed to each variant to make the test results reliable. This will also help you determine the best timing for the test (i.e., help you ensure you don’t end the test too early or too late).
For example: one of your trigger chains sends out 100 emails per day. The calculator determined that in order to run an experiement on two variants, you’ll need a sample size of 500 people per variant. This means you’ll need to send out a total of 1000 emails over 10 days.
This will help you understand which variant showed the best results and whether the difference between them can be considered statistically significant.
Conversion — the conversion rate for each group when the reliability score is 100%.
Confidence interval — this is the conversion according to the reliability score you entered. The reliability score is 95% by default, so this means that the conversion is 95% likely to be within the confidence interval range.
Reliability — this is how likely the test results are to be correct.
Group A outperformed group B — this means that the sample size and difference in conversion are statistically significant. These test results are reliable.
There is no statistically significant difference between groups A and B — the differences in conversion are likely down to chance. This could also be because the sample size is too small to get a statistically significant result. One reason for this may be that the test was ended too soon.
In order to determine how many people you’ll need for your test, use the Sample Size calculator above.