How to correctly measure the impact of email improvements: three ground rules

15 Jun ‘16

Good marketing is demonstrable.

You should measure any innovation to understand whether it generates incremental revenue. A/B tests are a simple and affordable way to measure this impact. More and more companies are starting to use them, because A/B tests work really well.

That said, if you ignore a few rules, you’re wasting your time: your test results might be positive, your conversion rates might be rising, but you’re not generating incremental revenue.

Three ground rules

  1. The difference in your results should be statistically significant
  2. Stick to one hypothesis per test
  3. Investments in improvement should pay for themselves

1. Is the difference statistically significant?

Never, ever, ever forget about statistics. A difference in results could just as easily be legitimate as random.

Say, for example, that you ask 10 customers at checkout whether they’ve recently changed their mobile number. It turns out that one person doesn’t use a phone at all. Would you conclude that 10% of your customers don’t use phones? Not likely. You’d chalk it up to chance. While there probably are people who don’t use phones, they’re definitely fewer than 10% of the population. What if you asked 1,000 people, and 100 of them said that they’d stopped using their mobile phone? You’d be mighty surprised, but it would be a lot harder to brush off that 10% figure.

There are well-developed methodologies and handy tools to understand when and to what extent you can trust your data. For example, take a look at these A/B Testing Calculators.

How can I apply this in practice?

Let’s say you conduct a test: you send a letter with a simple image to one group of customers and an animation to another. The conversion rate is 14% for one group and 17% for the other. It seems great, 21.5% conversion growth! But do you have a large enough sample size (emails or web views)?

Let’s plug our numbers into the Chi-Squared Calculator (put the bottom slider at 95%):


To correctly plan your test before you conduct it, you can use the Sample Size Calculator to find out the minimum number of each email variant you need to send, so that you can immediately get a reliable estimate of your results.

In our example, the calculator shows that you would need to send more than 3,000 emails for each variant before you could trust that the variant with a 17% conversion rate is more successful than the variant with a 14% conversion rate.

  • To determine what sample size you need for each variant (how many emails or site views), input two numbers into the Sample Size Calculator. The first number is the original conversion rate for one of the variants. The second number is the minimum amount you’d like to see the conversion rate increase to.
  • To understand whether you can trust the difference in conversion rates between two variants, input the absolute values into the Chi-Squared Calculator.
  • To determine whether the average check differs between the two variants, input two columns of data into the T-Test Calculator.

If you didn’t conduct the test, but were given a report, make sure that it was tested for statistical significance, or else request the data and check for yourself with these calculators. If you want to read more about this, here are the key terms: confidence interval, 95% confidence level, 5% significance level, Chi-Squared, T-Test, and Student’s T-Test.

If you’ve come across a disputable case or you can’t figure it out, write us at and we’ll try to help.

2. Are you sure you’re testing the right hypothesis?

How can you design an experiment so that you can measure exactly what you intend to? Seems simple, right? Let’s look at an example.

Hypothesis: product recommendations will increase email revenue

You might think that the A/B test should look like this: the first email variant doesn’t have recommendations, the second does.



But that’s wrong. Here’s why

When we say we want to measure the impact of recommendations, we mean we want to measure the specific way (algorithm) that we select products. Selection quality is considered the key to raising conversion rates. But what do we do when we add recommendations? In fact, we do two things:

  1. we add a product block to the email
  2. we use a recommendation algorithm to choose which products to put in the block

It turns out that we’ve broken the «one hypothesis per test» rule and are testing two variables at once.

As a result, we can’t understand what actually changed the conversion rate. Was it adding the extra product block, or the algorithm that selected the products?


Now what?

Apply the «one hypothesis per test rule» and add a second variant: an email with a block of products that were randomly selected.


By comparing pairs of conversion rates, you can determine the contribution of each component.


How many emails do you need to send to get a statistically significant difference? A lot. Comparing the first pair of 15% and 17% in the above graph, the difference will be statistically significant if you send more than 5,000 emails for each variant. For the second pair of 17% and 18%, you’ll already need more than 22,000 emails for each variant.

3. Is the investment paying off?

OK, say we’ve conducted a competent test and the significance test showed that the conversion rate or average check has increased. This is good and should generate incremental revenue.

Now, let’s see how much this incremental revenue cost us and whether the improvement paid off. With product recommendations, for example, we need to consider development (or purchase) costs for the algorithm, plus costs for updates and support.

If the recommendations generate an increment of 1%, is 1 additional order out of 100 «ordinary» orders enough to pay off the costs?


  1. Make sure your results are statistically significant
  2. Break your tests down to their basic parts and apply the «one hypothesis per test» rule
  3. Count your money carefully

If you run a lot of tests and pick the winning variants but your revenue still isn’t growing, this is a sign that you’re doing something wrong. A/B tests should work for your business.

Write us at if you still have questions, we’ll try help and will add an example to the article.

Tell us a little about yourself

We’ll respond within 24 hours

Partnership request

Typically we’re answering within 24 hours