Sep 28, 2011

How to get statistical confidence from your tests with small amounts of data.

tl;dr Split-testing doesn’t verify your numbers, it only verifies which option is better. If one of the options tested is a clear winner, you’ll know with small amounts of data. So, I use split-tests to look for big winners. If my test doesn’t show a big winner quickly, I move on quickly.

Myth - Split-testing requires a large sample size to be accurate

Don't get distracted by the numbers. We're used to thinking of statistical significance as only being possible with a large number of tests, known as the sample size in stats. In certain types of statistical tests, your sample size needs to be thousands or more because those tests establish the likelihood that a specific number will happen. When you are split testing, you are not learning about specific numbers, just which option is better. You only need a large sample size if there's a tiny difference between the two. If there's a big difference between the two, you get confidence with small sample size.

What does split-testing actually tell you?

I'll share a split-test I did on Unbounce that will make your brain jump out of your head and slap your face. I started with a waiting list page for Leancamp - It had a respectable 10% conversion rate, but I had launched the page really quickly and wasn't happy with it, so I made a change and tested that change as a split-test.

[caption id="attachment_1547" align="alignnone" width="300"]

Version A: Crappy starting version.

[caption id="attachment_1546" align="alignnone" width="300"]

Version B: Blatant copy of Buffer version

30 visitors later, Unbounce was telling me:

Version	Conversion rate	# of visitors
A	10%	100 visitors
B	25%	30 visitors	Winner with 99% confidence!

What, you cry out? 99% statistical confidence in just 30 visitors?!

Ask yourself, what was it so confident about?

That Option B was better. Maybe only slightly better, but better.

The test did not tell me that Option B would continue to perform at 25% or would be 15% better than Option A - just that Option B is very likely to outperform Option A in the long run.

Split testing only tells you which option is better, not how much better.

Get it? In a split-test, the only number you can really act on is the statistical confidence of which option is better. The conversion rates, impressions and click-through rates are not reliable as predictions, just the winning option. That's why you don't necessarily need big numbers to get confidence.

Using split-testing for quick, actionable learning.

Split testing is a tool to learn and improve quickly, giving you confidence of one option over the other. You can use it to evolve quickly and with confidence.

If you have a big winner on your hands, split-testing will tell you this quickly. So, especially when I’m starting, I look for big wins quickly. If my first test, say about a picture or a headline, doesn’t give me statistical confidence after 100-200 visitors, I usually scrap the test.

I would rather quickly abandon a version that might have worked better if I ran the test longer, because I can better invest that time in testing other things that might yield a big win. (There’s a balance to be found with sampling error here, but since I’m testing frequently and moving forward with so many unknowns, I accept false negatives in the interest of speed, and address sampling error when I’ve found a hit.)

This is how split-testing gives you actionable results fast.

Thanks Tendayi Viki and Andreas Klinger for reviewing this post.

Subscribe to Sal

What am I up to these days?

I’m a new parent, and prioritising my attention on our new rhythms as a family.

Work-wise, I’m trekking along at a cozy pace, doing stuff that doesn’t require meetings :)

I have a few non-exec/advisory roles for engineering edu programs. I’m also having fun making a few apps, going deep with zero-knowledge cryptography, and have learned to be a pretty good LLM prompt engineer.

In the past, I've designed peer-learning programs for Oxford, UCL, Techstars, Microsoft Ventures, The Royal Academy Of Engineering, and Kernel, careering from startups to humanitech and engineering. I also played a role in starting the Lean Startup methodology, and the European startup ecosystem. You can read about this here.

Contact me

Books & collected practices

Peer Learning Is - a broad look at peer learning around the world, and how to design peer learning to outperform traditional education
Mentor Impact - researched the practices used by the startup mentors that really make a difference
DAOistry - practices and mindsets that work in blockchain communities
Decision Hacks - early-stage startup decisions distilled
Source Institute - skunkworks I founded with open peer learning formats and ops guides, and our internal guide on decentralised teams