A/B Testing


What is A/B testing? It is an experiment with two groups to establish which group is better. The groups could be products, procedures, web pages, and so on. Sometimes one of the two treatments is the standard existing treatment or the status quo, meaning no treatment at all. A standard or no-treatment group is called the control group.

In an A/B test we use a two-sample test for means.

Some examples of A/B testing include testing two prices to see the volume of sales and net profit, testing two soil sample fertilizers to see which produces more seed germination, or testing two medical therapies to determine which one is more effective.

Website Landing Page Example

A/B tests are common in web design and marketing because its easy to measure the results. You could measure the average time spent on the landing page per session. Group A uses the default landing page and group B uses the newly designed landing page. Users of the website are randomly assigned to either group A or group B. The data professional will use a t-test to compare the average time spent on each landing page determine the difference the two sample means is statistically significant. Suppose group B spends more time on the landing page. Is that difference due to chance or is it due to the new design?

t and z

Do we know the population standard deviation? Usually we don’t know the population standard deviation. We would use a t-test in this case. If we knew the population standard deviation we’d use the z-test. The test statistic for the z-test is the Z-score and the test statistic for the t-test is the T-score. The Z-score is based on the normal distribution and the T-score is based on the t-distribution. What’s the difference? The graph of the t-distribution has a bell shape that is similar to the standard normal distribution, but the t-distribution has bigger tails than the standard normal distribution does. The bigger tails indicate the higher frequency of outliers that come with small datasets. As the sample size increases, the t-distribution approaches the normal distribution.

Two-Sample T-Test

We have the data from our two groups, A and B. We have the mean and standard deviation from group A and B. We can conduct a two-sample t-test to analyze the data. Version A is the control group and version B is the group that is exposed to the change. What are the steps?

  1. State the null and alternative hypothesis
  2. Choose a significance level (often it is 5%)
  3. Find the p-value
  4. Reject or fail to reject the null hypothesis

The null hypothesis states that there is no difference in the mean time spent on version a and version B. This is assumed to be true unless we find convincing evidence to the contrary. The alternative hypothesis states that there is a difference in the mean time spent on version A and version B.

The significance level is the threshold at which you will consider a result statistically significant. This is the probability of rejecting the null hypothesis when it is true. In other words, it’s the probability of saying “it IS different” when it fact there is no difference and the results are just due to chance. Let’s choose 5% as a significance level.

Now we find the p-value. You will do this on the computer using a programming language like Python. The p-value is the probability of observing a difference in your sample means as or more extreme than the difference observed when the null hypothesis is true. To find the p-value, first calculate your test statistic. We are conducting a t-test. The inputs of the t-test are the two sample means, the two sample standard deviations and the two sample sizes.

Below is the formula. X bar is sample means, the s is the sample standard deviation and the n is the sample size.

t = \dfrac{(\bar{X}_1 - \bar{X}_2)} {\sqrt{\left(\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}\right)}}

Suppose you get a test statistic of t = -1.2. Now we need to find our p value. Imagine a distribution similar to the normal distribution where we have our test statistic of -1.2 on the left tail and 1.2 on the right tail. The p-value is the area under the curve of the two tails. Suppose you calculate a p-value of 0.2. That is 20%.

Draw a Conclusion

To draw a conclusion, compare you p-value to the significance level. If your p-value is less than your significance level you will conclude that there is a statistically significant difference and you reject the null hypothesis. If your p-value is greater than the significance level you would fail to reject the null hypothesis and claim that there is no difference between A and B.

Leave a Reply