Combining A/B Testing and Personalization

Personalization is an important component of the digital experience nowadays to serve and convert customers better. However, how good is personalization? There are lots of studies, research, and reports that show significant improvement in all kinds of KPIs while adopting personalization. I do not doubt that. However, we can know if a single personalization activity improves the desired KPI and how much it is. Can we use A/B testing to verify the effectiveness of personalization?

Usually, A/B testing and personalization are two distinct types of digital experimentation. In Adobe Target, we have to select either A/B testing or personalization (and more) to start. The default experience in personalization is not like the control experience in A/B testing to understand how well the personalization works against the default. The default experience in personalization is the experience for the visitors we are not targeting. We can still able to calculate performance and uplift by comparing the targetted version against the default experience but the concept is completely different from A/B testing as 100% of specific targetted visitors have the same experience.

So, how to combine A/B testing and personalization so that we can deliver personalized experiences and test the effectiveness of personalization at the same time?

Visualizing A/B testing and personalization

Before coming them together, let’s have a visual understanding of A/B testing and personalization first.

A/B testing randomly allocates each individual of one group to different experiences. The original and based experience is usually designated as control and all others are testing experiences. The performance of A/B testing is the uplift of the KPI conversion rate of all testing experiences compared to control.

Personalization is not random. It intentionally delivered a designated experience to a segment of users. We believe and delivering the best-fitted experience to the corresponding segment. Visitors who are not in any predefined segments will receive the default and untargeted experience. We are still able to calculate the performance of each experience by comparing the conversion rates of the KPI of target experiences against the untargeted experience.

Now comes the question at the very beginning. This uplift performance does not indicate whether the personalization working or not. If the conversion rate of Experience B is the best, are we going to deliver Experience B to all users? No.

There is no randomness in the personalization and no hints of adjusting either segmentation or experience design. Combining A/B testing and personalization seems to be the answer.

Three approaches

Testing-Personalization (TP)

We can have the combination with A/B testing first and followed by personalization. I call it the TP experiment. The first is an A/B testing with two branches, for the control and test groups. The control group visitors will receive the control and default experience. If a visitor falls under the test group, then the personalization comes in to deliver the designated experience to each segment. The performance of the TP experiment should be the total KPI conversion divided by the total number of visitors in the test group, then divided by the same of the control group. This can tell how big the overall impact of personalization over no personalization is.

Personalization-Testing (PT)

Another way of combination is having personalization first then followed by A/B testing, the PT experiment. The personalization assigns visitors to different A/B tests to find out the most appropriate experience for each segment. This is a series of A/B testing running independently where each A/B testing can have a different number and design of experiences. However, to make the entire exercise comparable, it is better to have all segments have their own A/B testing following the same design or principles, such as testing the same hero banner with different personalization offers for the corresponding segment.

The performance calculation could be complicated and depends on how each A/B testing is configured. The most direct performance is the uplift of KPI conversion within each A/B testing. Then from each of those winning experiences, we can then have the uplift of targetted performance over the untargeted experience.

If applying the same A/B testing design to all segments so that experience A.1 is comparable to B.1, C.1, D.1, and similarly A.2 to B.2, C.2, D.2. We can calculate the uplift of matching experience of targeting segments from the untargeted segment on the matching experience. Then tabulate uplift to understand the overall performance.

Testing-Personalization-Testing (TPT)

I like to keep things simple but there is the third TPT experiment. I won’t make anything more complicated. It is stacking what we have from above into this final version. Similar to the TP experiment, it starts with the A/B testing split for test and control groups. Followed by the personalization of segments where each segment receives an A/B testing for best-performing experiences.

The performance calculation should all compare to the KPI conversion of the default experience for the control group. This can test out if personalization working or not, and also find the best experience for each experiment.

One of the variations not covered by the above three types of experiments is the segmentation design. If the segmentation is not right and does not reflect people’s attributes, there can be no significant result. However, identifying segments is the scope of analytics so I am skipping that part here.

Implementation in Adobe Target

After defining those three types of experiments, we need to realize them. Of course, Adobe Target is the tool.

There are two major considerations when setting up combined A/B testing and personalization activity in Adobe Target. How to set up and how to calculate performance. Ideally, if we can configure the combined experiment as one single Target activity, it is easier to set up and easier to calculate performance, as all experiences are in the same activity. However, A/B testing and personalization are two distinct types of activity in Adobe Target, there are chances that we need to have multiple activities in Adobe Target together to run those combined experiments. In that case, we need to calculate the overall performance externally by referring to my previous post.

PT experiment

The PT experiment is the easiest to implement as it is just simply a collection of A/B testing for different segment targeting. The primary consideration is the overlapping of segments. In basic personalization, Adobe Target returns only the first matched experience to the visitor, even if the visitor could be a match to multiple segments in the activity. However, since all A/B testing activities in the PT experiment run simultaneously, Adobe Target will run all activities if a visitor matches the segment definition. We can use the priority of activity to control the experience, https://experienceleague.adobe.com/en/docs/target/using/activities/priority. However, it can be very difficult to manage and somehow impossible, such as modifications were implemented using custom code. So it is best to non-overlapping segments for the PT experiment, to avoid the issue completely.

TP experiment

The TP experiment is more complicated as it cannot be implemented via multiple Adobe Target activities. An initial idea is running an A/B testing activity first to split visitors into test and control groups, then deliver a personalization activity to the test group. Unfortunately, we cannot sequence the execution of Adobe Target activity. Adobe Target evaluates all activities in one single server call and returns entitled activities with allocated experience.

So we need to convert the two-layer TP experiment activity into a simple Adobe Target activity, a personalization activity specifically. The following diagram shows the combination of the test/control split as part of the segment definition.

We need to control the A/B split by ourselves as part of the segment definition. Of course, this sacrifices some features in the genuine A/B testing in Adobe Target, such as automatically allocating more visitors to the winning experience. That is something we need to accept.

To manually control the A/B split, we can use a profile script, like the following to return either “test” or “control” for a 0.1 threshold. More references to the profile script can be found in the official documentation, https://experienceleague.adobe.com/en/docs/target/using/audiences/visitor-profiles/use-profile-scripts-to-test-mutually-exclusive-activities. With this profile script, we can create audience and combine with the original audience definitions in the personalization to put together A/B testing and personalization.

if (!user.get('test-control')) {
    var ran_number = Math.floor(Math.random() * 100);
    if (ran_number < 10) {
        return 'control';
    } else {
        return 'test';
    }
}

This type of experiment also offers the advantage of directly reporting the winning experience by setting the correct default experience within Adobe Target, just like a normal personalization activity. To understand the overall effectiveness of personalization over the control group, we need to sum up the number of visitors and conversions to calculate the uplift of the test group and compare it to the same of the control group.

TPT experiment

First, I admit that I didn’t try the TPT experiment myself, yet. It is doable by extending either TP or PT experiments.

To extend from the PT experiment, it is as simple as adding the test/group segment definition to all involved A/B testing activities. The one missing thing from here is the control group experience. Since all A/B testing activities are delivered to test group visitors, the control group visitors are seeing the true default experience on the webpage without any Adobe Target modification. I say it is a “true” default experience as we can modify the control/default experience in A/B testing and personalization activities so it is not a “true” default. Another consideration of the missing control experience in this TPT experiment is the calculation of the result, we need to separately pull the number of visitors and conversions from Adobe Analytics where Adobe Target activity/experience is “Unspecified”, so they are the control group behaviour and being used as baseline for uplift calculation.

To extend from the TP experiment, the following diagram can give a good idea. Those “X/Y” in the diagram indicate the total number of X experiences and assign the Y to the visitor. We need to calculate the allocation of experiences by creating corresponding profile scripts, creating audiences, and using them in a personalization activity. The idea is straightforward but execution is not.

The following is the profile script to return the allocation experience from a total of 3 with almost equal chance at 0.33:0.33:0.34.

if (!user.get("3-experience")) {
    var ran_number = Math.floor(Math.random() * 100);
    if (ran_number < 33) {
        return "exp-a";
    } else if (ran_number < 66) {
        return "exp-b";
    } else {
        return "exp-c";
    }
}

There are a lot of profile scripts and audiences to create but all experiences, with the control experience, live in the same personalization activity to make the uplift calculation easier.

An unpreferred type of experiment and wish

As it is unpreferred, I also didn’t try that in action 😅.

We can use multiple Adobe Target activities to implement the TP and TPT experiments. By having a first A/B testing activity with no visible experience change but only updating a cookie or similar mechanic to indicate the test/control group the visitor belongs to. Then a second layer of a single personalization activity for TP or a set of A/B testing activities for TPT to deliver the final experiences. Both personalization and A/B testing activities in the second layer require an audience to match the test group cookie.

The reason for the unpreferred is that requires two Adobe Target server calls. The first call is to run the initial A/B testing to determine the test/control group, and the second call is to run the second layer of activities. The result is that we cannot deliver the full experiment on the first-page view. We need the visitor to view the page for the second time, where the test/control group cookie is already set, to deliver the full experiment.

The ideal case is Adobe implementing such multiple layers of activity in Target, so the whole setup and the uplift result calculation are also easier.