Expert Commentary

Rolling Out Multiarmed Bandits for Fast, Adaptive Experimentation

How to outperform conventional A/B testing when scaling up personalized messages and services.

By Joshua Mabry, Janani Sriram, and Richard Lichtenstein

First published on septiembre 14, 2021
min read


Executive Summary Marketing teams often lack the ability to quickly run in-market tests and scale them up. Traditional A/B testing and even multivariate testing fall short for marketing that has frequent customer touches. Multiarmed bandits, by contrast, dynamically steer traffic toward winning marketing messages, decreasing the cost of testing due to lost conversions. Pricing experiments are a particularly useful application since retailers must balance the need for a demand model that informs long-term profits without compromising immediate profits. With third-party cookies on the wane, marketers rely increasingly on first-party data. Most retailers are investing heavily in platforms to capture and unify their customer data. Across the board, they have been reaping value from triggered campaigns, with simple purposes such as reminding customers to return to their abandoned carts or to consider relevant product assortments. Now, there’s a broader opportunity—namely, to use artificial intelligence (AI) to segment customers and automatically orchestrate aspects of their customer experience, ranging from marketing messages to retention interventions. Yet while many companies talk about creating a deeply personalized experience, few have made good use of AI. Worse, many have invested in advanced marketing technology stacks, but they cannot take advantage of the personalization capabilities advertised by platform providers. The main constraint: Marketing teams often lack the ability to quickly run in-market tests and scale up these systems through automation. Enter the multiarmed bandit We attribute the testing bottleneck to a reliance on traditional A/B testing approaches. These tend to be highly manual to set up, implement, and interpret. Moreover, the insights generated may be ephemeral because of shifting consumer preferences and underlying seasonality in many markets. Companies that send daily messages to customers see steep decay curves as even the highest-performing messages lose effectiveness by the third time someone sees them. Moreover, multivariate testing (MVT), which is a more powerful approach that can test many variables at once, also suffers from this flaw as the huge lift that it generates erodes with frequent customer touches. MVT can, however, work well for marketing touches that occur infrequently for an individual consumer, such as a subscription. Marketers can gain greater value by adopting adaptive experimentation approaches that more efficiently optimize customer engagement or financial metrics. These highly automated and always-on tools dynamically steer traffic toward winning marketing messages, decreasing the cost of testing caused by lost conversions. We have seen retailers realize double-digit sales increases by setting their ambitions higher and by automating the testing process using these advanced approaches. One of the most effective algorithms is the multiarmed bandit (MAB), which can be applied to use cases ranging from offer optimization to dynamic pricing. Because the MAB is always optimizing, we see persistent lift even for daily customer contacts (see Figure 1). Figure 1 Multiarmed bandit has several benefits over traditional A/B or multivariate testing MABs provide a simple, robust solution for sequential decision making during periods of uncertainty. To build an intelligent and automated campaign, a marketer begins with a set of actions (such as which coupons to deliver) and then selects an objective (such as maximizing clickthrough rates or EBITDA for email marketing). The algorithm balances exploration (gathering more data on new actions) with exploitation (selecting actions that perform well). The goal here is to select actions that maximize the payoff and quickly converge on the best set of actions. As market conditions change, the campaigns can easily be reset to discover new winners, or in more sophisticated designs, they can be configured to continue the testing cycle indefinitely. Pricing experiments are a particularly useful application since retailers must balance the need for a demand model that informs long-term profits without compromising immediate profits. They thus “earn while learning” through in-market tests rather than “learn then earn.” As with any learning algorithm, it is important to be thoughtful about objective functions. For instance, an objective tied to revenue rather than profit may lead the MAB to converge on a solution with excessive discounting if the algorithm decides that deep discounts are a great way to increase revenue. Online service applications Companies often use bandit solutions to speed up experimentation and personalize the experience for users of online services. Such solutions share a few characteristics: They make several actions, such as unique ads or email messages, available for different users. Marketers can quickly track user response to the action. Marketers can easily adapt the online system, such as when recommending a different product, at a low cost. In their most basic form, MABs serve as a more efficient alternative to A/B testing, adaptively allocating traffic to find a winning version of a website, email, advertisement, or other marketing action. In most digital systems, each user interaction also gathers some side information about the user and the action, known as the context. This might be information about the user’s current circumstance (cohort, location, time of day) or historically computed information (past spending, age, gender, shopping history). Contextual bandits extend the MAB framework and learn how to use this additional information to make decisions that optimize a target metric, such as profits or clickthrough rate. Personalization with contextual bandits Leading digital organizations implement contextual bandits for core services, such as promotional offer selection, in which it’s important to personalize the experience and adjust to fast-changing market conditions. The leaders also generate a steady stream of innovative content to test: new creative, imagery, promotions, and products. Constantly feeding the bandit with new ideas to test helps to avoid getting stuck with less-than-optimal results and generates new insights into customer behavior. Also, because bad ideas fail fast while winners rise to the top, companies can take bigger risks with their marketing ideas than they could in a slower-moving test cycle. There are a few signals that a company is ready for more advanced approaches such as a contextual bandit: a robust and fast-moving experimentation program; customer data that can be accurately matched to historical marketing and transactional records; and product-focused teams that can optimize high-value customer touchpoints. Taking on the complexity of a bandit makes sense for an organization already running tests at scale, and the traditional testing generates valuable digital exhaust that can be fed into the bandit algorithm. One typically trains a contextual bandit on logged data stored in a data warehouse or other analytical data storage. Here, a company needs records of marketing actions served (such as which coupon was sent) and the resulting reward metric at the individual customer level, as well as metadata describing both the action and the user history at the time of campaign execution. With data in hand, a bandit model can be trained on any modern machine learning (ML) platform with model training, versioning, and serving capabilities. Usually, the value is established by building a minimum viable product algorithm operating on batches of data at a cadence that allows for careful validation by data science teams before being put into production. Personalized marketing messages can be served through web, email, or application-specific channels, and often there is some application programming interface (API) development work required to integrate the ML models with these channels. Luckily, most of the channel-specific tools include personalization APIs, which populate the personalized content within a message template, so these integration tasks are relatively straightforward. As with any ML/AI system, continuous monitoring and ongoing maintenance remain important, so these systems are most effective in the hands of stable, product-focused marketing teams. Looking ahead, we expect to see broader adoption of AI and adaptive experimentation techniques, from which marketers can more effectively learn and activate first-party customer data.

Executive Summary

Marketing teams often lack the ability to quickly run in-market tests and scale them up.
Traditional A/B testing and even multivariate testing fall short for marketing that has frequent customer touches.
Multiarmed bandits, by contrast, dynamically steer traffic toward winning marketing messages, decreasing the cost of testing due to lost conversions.
Pricing experiments are a particularly useful application since retailers must balance the need for a demand model that informs long-term profits without compromising immediate profits.

With third-party cookies on the wane, marketers rely increasingly on first-party data. Most retailers are investing heavily in platforms to capture and unify their customer data. Across the board, they have been reaping value from triggered campaigns, with simple purposes such as reminding customers to return to their abandoned carts or to consider relevant product assortments.

Now, there’s a broader opportunity—namely, to use artificial intelligence (AI) to segment customers and automatically orchestrate aspects of their customer experience, ranging from marketing messages to retention interventions. Yet while many companies talk about creating a deeply personalized experience, few have made good use of AI.

Worse, many have invested in advanced marketing technology stacks, but they cannot take advantage of the personalization capabilities advertised by platform providers. The main constraint: Marketing teams often lack the ability to quickly run in-market tests and scale up these systems through automation.

Enter the multiarmed bandit

We attribute the testing bottleneck to a reliance on traditional A/B testing approaches. These tend to be highly manual to set up, implement, and interpret. Moreover, the insights generated may be ephemeral because of shifting consumer preferences and underlying seasonality in many markets. Companies that send daily messages to customers see steep decay curves as even the highest-performing messages lose effectiveness by the third time someone sees them.

Moreover, multivariate testing (MVT), which is a more powerful approach that can test many variables at once, also suffers from this flaw as the huge lift that it generates erodes with frequent customer touches. MVT can, however, work well for marketing touches that occur infrequently for an individual consumer, such as a subscription.

Marketers can gain greater value by adopting adaptive experimentation approaches that more efficiently optimize customer engagement or financial metrics. These highly automated and always-on tools dynamically steer traffic toward winning marketing messages, decreasing the cost of testing caused by lost conversions. We have seen retailers realize double-digit sales increases by setting their ambitions higher and by automating the testing process using these advanced approaches. One of the most effective algorithms is the multiarmed bandit (MAB), which can be applied to use cases ranging from offer optimization to dynamic pricing. Because the MAB is always optimizing, we see persistent lift even for daily customer contacts (see Figure 1).

Multiarmed bandit has several benefits over traditional A/B or multivariate testing

MABs provide a simple, robust solution for sequential decision making during periods of uncertainty. To build an intelligent and automated campaign, a marketer begins with a set of actions (such as which coupons to deliver) and then selects an objective (such as maximizing clickthrough rates or EBITDA for email marketing). The algorithm balances exploration (gathering more data on new actions) with exploitation (selecting actions that perform well). The goal here is to select actions that maximize the payoff and quickly converge on the best set of actions. As market conditions change, the campaigns can easily be reset to discover new winners, or in more sophisticated designs, they can be configured to continue the testing cycle indefinitely.

Pricing experiments are a particularly useful application since retailers must balance the need for a demand model that informs long-term profits without compromising immediate profits. They thus “earn while learning” through in-market tests rather than “learn then earn.” As with any learning algorithm, it is important to be thoughtful about objective functions. For instance, an objective tied to revenue rather than profit may lead the MAB to converge on a solution with excessive discounting if the algorithm decides that deep discounts are a great way to increase revenue.

Online service applications

Companies often use bandit solutions to speed up experimentation and personalize the experience for users of online services. Such solutions share a few characteristics:

They make several actions, such as unique ads or email messages, available for different users.
Marketers can quickly track user response to the action.
Marketers can easily adapt the online system, such as when recommending a different product, at a low cost.

In their most basic form, MABs serve as a more efficient alternative to A/B testing, adaptively allocating traffic to find a winning version of a website, email, advertisement, or other marketing action. In most digital systems, each user interaction also gathers some side information about the user and the action, known as the context. This might be information about the user’s current circumstance (cohort, location, time of day) or historically computed information (past spending, age, gender, shopping history). Contextual bandits extend the MAB framework and learn how to use this additional information to make decisions that optimize a target metric, such as profits or clickthrough rate.

Personalization with contextual bandits

Leading digital organizations implement contextual bandits for core services, such as promotional offer selection, in which it’s important to personalize the experience and adjust to fast-changing market conditions. The leaders also generate a steady stream of innovative content to test: new creative, imagery, promotions, and products. Constantly feeding the bandit with new ideas to test helps to avoid getting stuck with less-than-optimal results and generates new insights into customer behavior. Also, because bad ideas fail fast while winners rise to the top, companies can take bigger risks with their marketing ideas than they could in a slower-moving test cycle.

There are a few signals that a company is ready for more advanced approaches such as a contextual bandit:

a robust and fast-moving experimentation program;
customer data that can be accurately matched to historical marketing and transactional records; and
product-focused teams that can optimize high-value customer touchpoints.

Taking on the complexity of a bandit makes sense for an organization already running tests at scale, and the traditional testing generates valuable digital exhaust that can be fed into the bandit algorithm. One typically trains a contextual bandit on logged data stored in a data warehouse or other analytical data storage. Here, a company needs records of marketing actions served (such as which coupon was sent) and the resulting reward metric at the individual customer level, as well as metadata describing both the action and the user history at the time of campaign execution.

With data in hand, a bandit model can be trained on any modern machine learning (ML) platform with model training, versioning, and serving capabilities. Usually, the value is established by building a minimum viable product algorithm operating on batches of data at a cadence that allows for careful validation by data science teams before being put into production. Personalized marketing messages can be served through web, email, or application-specific channels, and often there is some application programming interface (API) development work required to integrate the ML models with these channels. Luckily, most of the channel-specific tools include personalization APIs, which populate the personalized content within a message template, so these integration tasks are relatively straightforward.

As with any ML/AI system, continuous monitoring and ongoing maintenance remain important, so these systems are most effective in the hands of stable, product-focused marketing teams. Looking ahead, we expect to see broader adoption of AI and adaptive experimentation techniques, from which marketers can more effectively learn and activate first-party customer data.