A deep overview of the Bayesian approach for AB testing
AB testing is an essential tool for deciding whether or not to go ahead with rolling out an incremental feature or campaign copy. To perform an AB test, we randomly divide users into groups and then serve different content for each one, which we call variants.
By using a correct randomization procedure, we can attribute any difference in outcomes (e.g., the conversion rate) between the two groups to the change we are testing.
Before acting on the results, we must understand if any performance differences we observe are merely due to chance rather than the test’s change. For example, it is perfectly possible to obtain different heads/tails ratios between two fair coins if we only conduct a limited number of throws.
At Croct, we use Bayesian methods to do this since it provides richer and more informative insights than other typical, simplistic approaches. It helps us avoid some common pitfalls of statistical testing and makes our analysis easier to be understood and communicated to non-technical people.
This post brings a broader perspective about the Bayesian approach, starting with the metrics used in the analysis.
These are the key metrics for a Bayesian analysis:
The conversion can be any tracked action taken by users as a business goal. Understanding what percentage of users complete one given action allows gauging the success of the application and identifying improvement points
The amount of relative improvement in conversion rate between a variant and the baseline
Probability to be best (PBB)
Indicates the long-term probability of each variant to outperform all other variants, given the data collected since the creation or change of any variation included in the test
The risk of choosing one variant over another; represents the potential loss in conversion rate.
Frequently, companies use these metrics together with heuristics (e.g., a minimum number of sessions or conversions per variant) to automatically declare the winner variant.
Bayes' theorem is one of the core concepts in probability theory. It describes the likelihood of an event to happen when conditioned by any related piece of evidence and given prior knowledge of its occurrence rate. For example, we can use it to describe the likelihood of a conversion to happen given an initial guess of 7% of occurrence rate (prior knowledge) and conditioned on multiple sessions' data (evidence).
When there's no evidence that something can happen, humans tend to believe that it's impossible.
However, when it has already happened once before, twice before, and so on, it becomes clear that there's a probability of that event to happen again.
Thus, the brain adjusts the neural weights to incorporate the new evidence; updating the brain's neural weights is analogous to using Bayes' theorem to update a probability distribution.
The mathematical definition of Bayes' Theorem is:
The symbol means "given". For example, the left-hand side reads "the probability of event A happening given that B happened". In the context of AB testing, the equation is:
It uses new data and past knowledge about the conversion rate to progressively update the left-hand side, the variable of interest (the "Posterior" as explained further below).
The idea is that for each example, the left-hand side is a single conversion rate value. With multiple examples, it becomes a distribution, representing how likely it is that the conversion rate assumes different values.
Since the interpretation is not trivial, here's an example describing an interesting use-case of Bayes' theorem.
The probability distribution is at the core of Bayesian AB testing, and it's fundamental to understand the test results. Usually, the distribution is represented by a probability mass function (discrete variables) or a probability density function (continuous variables).
The technical definition is "a mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment". In other words, the probability distribution describes the likelihood of a random variable to assume a specific value. For example, the conversion rate is likely 5% for a given data set.
One way to better understand a distribution is to analyze its visual representation. In the x-axis are the values the random variable can assume, while the y-axis measures the distribution density for each value. The distribution peak identifies the highest density area (i.e., the area with very likely probability values).
Visually, the example looks like this:
In this example, we can see that the most likely conversion rate is around 5%, the point of highest density.
We created an example using 1k samples, from which 50 converted for a nominal conversion rate of 5%. Because nothing is certain in statistics, the probability distribution is a great way to represent uncertainty, defining the range of possible values a random variable can assume and the likelihood of it assuming each value.
It's worth noting that the density value is not precisely the likelihood, but they are directly correlated (i.e., a higher density represents a higher likelihood).
The great thing about using probability distributions and how they are beautifully connected to Bayesian statistics is that the uncertainty decreases in light of new evidence (data). To illustrate this, the following distribution is the result of observing 10k samples (instead of 1k):
The interval shrinks to a 1% point range, and the highest density value is right where it should be, very close to 5%. It's clear that now there's less uncertainty in the distribution, as the values are more concentrated around the mean value of ~5%.
This should start giving an intuition on the impact of adding more data to the analysis and why it is crucial to have a minimum number of examples before ending a test.
Prior probability distribution
With the intuition of Bayes' theorem and the probability distribution, it's easier to understand the prior and posterior probability.
The prior probability distribution is a way to incorporate past knowledge to calculate the posterior distribution. It expresses our beliefs about the unknown quantity (e.g., the conversion rate) before considering any piece of evidence.
We can define the prior distribution in multiple ways:
Using information from past experiments
We can use the metrics from previous experiments to construct the prior distribution. For example, if it's known that the last home banner CTA had a 5% conversion rate, it's reasonable to use this as the prior probability. As previously mentioned, the confidence in a probability distribution increases with more samples, which is also true for the prior. For the 5% example, the prior could be constructed with 100 or 10000 examples, for low and high confidence, respectively.
Using a subjective prior
It's also possible to use our opinion as the prior – preferably someone with expertise to make an educated guess rather than a completely blind one. For example, a marketing expert might use 10% as the prior conversion rate for a home banner CTA. It probably isn't 10%, but it's definitely not 0.001% nor 99.9%.
Using an uninformative prior
The uninformative prior is appropriate to reflect a balance between the outcomes (e.g., conversion or not) when no information is available. Most AB testing tools start with an uninformative prior due to the lack of information about the specific experiments of each customer.
It's worth pointing out that after computing the posterior distribution once, it becomes the prior probability for the following computation. Therefore, choosing an initial prior distribution shouldn't require much effort, as it gradually becomes irrelevant with more evidence.
Posterior probability distribution
The posterior probability distribution results from applying Bayes' theorem to the new data and past knowledge. As we mentioned before, the posterior for round is the prior for round , where the system triggers each round after collecting some new data.
This behavior allows to incrementally improve the predictions of the conversion rate using new evidence. If the analysis ran indefinitely, the confidence about the conversion rate might reach 99.99%, but it will never be 100% certain about the value – although there's virtually no difference.
The posterior distribution is the most important result because it enables computing every metric to display on reports. Thus, the Bayesian engine's primary goal is to gradually update the posterior to improve the quality of the estimates.
How Croct’s AB testing engine works
Croct provides a Bayesian AB testing engine to analyze experiments results. It calculates metrics in real time and without the use of sampling as new data enters the system. That’s how we ensure results are accurate and avoid common pitfalls in some statistical methods.
Do you have questions about how our platform works? Read the next post in this series where we discuss how our AB testing engine works, or create your free account and explore our platform by yourself.