A/B test calculator

Enter visitors and conversions for a control and one or more variants to calculate rates, lift, confidence intervals, p-values, and approximate statistical significance.

Control A
Variant B

How to read your A/B test results

Enter the visitors and conversions for your control and each variant, and the calculator reports conversion rates, relative lift, confidence intervals, and a p-value for each comparison — plus a conversion-rate graph you can download for reporting.

The numbers it produces:

  • Conversion rate — conversions ÷ visitors for each arm.
  • Lift — the relative improvement of a variant over control. A control at 6.2% and a variant at 7.1% is a lift of about 14.5%.
  • Confidence interval — the plausible range for each true conversion rate. If the control and variant intervals overlap heavily, the test hasn’t separated them yet.
  • p-value — the probability of seeing a difference at least this large if the variants actually performed identically. By convention, p < 0.05 is called statistically significant at 95% confidence.

Significant vs. meaningful

A significant result is not automatically a meaningful one. With enough traffic, a 0.5% relative lift can reach p < 0.05 — but it may not justify engineering work or design churn. Conversely, a large observed lift on a small sample is often noise: the confidence interval will be wide, and the p-value high.

Read the three numbers together: the lift tells you the size of the effect, the confidence interval tells you how precisely you’ve measured it, and the p-value tells you whether the data rules out “no difference at all.”

Frequently asked questions

What does p < 0.05 actually mean?

It means that if the control and variant truly converted at the same rate, you’d see a gap this large less than 5% of the time by chance. It is not the probability that the variant is better, and it says nothing about how much better — that’s what the lift and confidence interval are for.

My test isn’t significant. Is the variant a loser?

Not necessarily — “not significant” means “not enough evidence,” not “no difference.” If the sample is small, the test may simply be underpowered. Check the sample size calculator to see how much traffic the comparison really needs before you call it.

Can I keep adding traffic until the result becomes significant?

Repeatedly checking and stopping the moment p dips below 0.05 inflates your false-positive rate badly. Fix the sample size in advance, run to completion, then analyse once. If you need to monitor continuously, use a sequential testing framework built for it.

How do I handle more than one variant?

Add each variant in the form and the calculator compares every arm against the control. Be aware that each extra comparison adds another chance of a false positive at the 5% level — with several variants, demand stronger evidence (a smaller p-value) before declaring a winner.

Visitors or sessions or users — which should I enter?

Use the same unit your testing platform randomises on, which is usually users. The critical part is consistency: the visitor and conversion counts for each arm must be measured the same way, over the same period.