Sample size calculator

Estimate how many visitors you need per variant for an A/B test based on your baseline rate, target uplift, confidence level, power, and traffic.

Test assumptions

What these inputs mean

  • Baseline conversion rate Your current conversion rate before the test starts. If 5 out of every 100 visitors convert, your baseline conversion rate is 5%.
  • Minimum detectable uplift The smallest improvement you care enough to detect. If your baseline is 5% and you enter 10%, the calculator tests for a lift from 5.0% to 5.5%.
  • Confidence level How strict you want to be before calling a result real. A higher confidence level means you need more data, but you reduce the chance of acting on noise.
  • Power How likely the test is to catch a real improvement if that improvement actually exists. Higher power usually means a larger required sample size.
  • Daily visitors across both variants The total traffic you expect to send into the test each day, split across control and variant. This is only used to estimate how long the test may take.

Why sample size matters

The most common A/B testing mistake isn’t bad statistics — it’s stopping too early. With too few visitors, a test can’t distinguish a real improvement from random noise, and “peeking” at results until one variant happens to look significant almost guarantees false wins.

This calculator answers the planning question before the test starts: how many visitors does each variant need so that, if the improvement you’re hoping for is real, the test will probably detect it?

How the calculation works

The calculator uses the standard two-proportion sample size formula. In plain terms, the required sample per variant is driven by four inputs:

  • Baseline conversion rate — your current rate. Rare events (1–2% conversion) need far more traffic than common ones (20% click-through), because each visitor carries less information.
  • Minimum detectable uplift — the smallest relative improvement worth detecting. Detecting a 5% lift takes roughly four times as much traffic as detecting a 10% lift; halving the effect quadruples the sample.
  • Confidence level — how sure you want to be that a detected difference isn’t a false positive. 95% is the common default.
  • Power — the probability of detecting the uplift if it genuinely exists. 80% is the common default; 90% costs noticeably more traffic.

Add your expected daily traffic and the calculator also estimates how many days the test needs to run.

A worked example

Baseline 5%, minimum detectable uplift 10% (so 5.0% → 5.5%), 95% confidence, 80% power: you need roughly 31,000 visitors per variant, about 62,000 total. At 10,000 daily visitors across both arms, plan for about a week — and at 1,000 daily visitors, about two months, which is why low-traffic sites should test bigger, bolder changes.

Frequently asked questions

The test is underpowered: a real improvement will often fail to reach significance, and you’ll wrongly conclude the change didn’t work. Underpowered tests also exaggerate the size of the wins they do find.

Should I pick a small uplift so the test is more sensitive?

Only if you can afford the traffic. Be honest about the smallest lift that would actually justify shipping the change — if a 2% improvement wouldn’t change your decision, don’t pay the (very large) sample cost of detecting it.

Can I stop the test early if results look significant?

Not with this kind of fixed-sample test. Checking repeatedly and stopping at the first significant reading inflates the false-positive rate well beyond the stated 5%. Decide the sample size up front and evaluate once you reach it, or use a sequential testing method designed for early stopping.

Does the calculator account for more than one variant?

The result is per variant against a control. Testing control plus two variants means three groups of the stated size, and you should also apply a multiple-comparison correction when you analyse. When you have results, evaluate them in the A/B test calculator.