When should you use Multivariate testing, and when is A/B/n testing best?
The answer is at once simple and complex.
Of course, A/B testing is the default for most people, as it is more common in optimization. But there is a time and a place for multivariate testing (MVT), as well, and it can add a lot of value.
Before we get into the nuances, let’s briefly go over the differences.
What is multivariate testing?
Multivariate testing is, in a sense, a more complex form of testing than A/B testing. A/B testing is fairly straight forward:
You can also measure the performance of three or more variations of a page with A/B/n tests. As Yaniv Navot of Dynamic Yield wrote, “High-traffic sites can use this testing method to evaluate performance of a much broader set of variations and to maximize test time with faster results.”
Here’s what an A/B/C/D test looks like conceptually:
A/B testing usually involves less combinations with more extreme changes, whereas multivariate tests have a large number of variations that usually have subtle differences.
Lars Nielson of Sitecore described it as the following:
“Multivariate testing, opposes the traditional scientific notion. Multivariate testing is the process of testing more than one component on the web site in a live environment. Essentially, it can be described as running multiple A/B/n tests on the same page, at the same time.”
The Case For A/B/n Tests
Should you use MVT or A/B/n tests?
If you have enough traffic, use both. They both serve different yet important purposes. In general, A/B tests should be your default, though.
With A/B testing you can:
- You can test more dramatic design changes
- Tests usually take way less time than MVTs
- Advanced analytics can be installed and evaluated for each variation (e.g. mouse tracking info, phone call tracking, analytics integration, etc.)
- Individual elements and interaction effects can still be isolated for learning & customer theory building
- A/B tests typically bring bigger gains (since you often test bigger changes)
A/B testing tends to get meaningful results faster. The changes between pages are more drastic, so it’s easier to tell which page is more effective.
So A/B testing harnesses the power of large changes, not just tweaking colors or headlines as is sometimes the case with MVT. Optimizers usually start all engagements with A/B testing, because that’s where the bigger gains are possible
Yaniv Navot, Director of Online Marketing at Dynamic Yield, also mentioned that MVT is mainly used for smaller tweaks. He also mentioned that A/B tests are better for multi-page and multi-scenario experiences:
Yaniv Navot:
“Multivariate testing tends to encourages marketers to focus on small elements with little, or no impact at all. Instead, marketers should focus on running programmatic and dynamic A/B tests that enable them to serve segmented experiences to multiple cohorts across the site. This cannot be achieved using traditional multivariate testing.”
Something else to worry about with MVT: the amount of traffic you get.
How Much Traffic Do You Get?
Because of the additional variations, multivariate tests require a lot of traffic. If not high traffic, at least high conversion rates.
For example, a 3×2 test (testing 2 different versions of 3 design elements) would require the same amount of traffic as an A/B test with 9 variations (3^2). 3×2 is a typical MVT test.
In a full factorial multivariate test, the your traffic is divided evenly among all variations, which multiplies the amount of traffic necessary for statistical significance. As Leonid Pekelis, Statistician at Optimizely, said, this results in a longer test run:
“All together, the main requirement becomes running your multivariate test long enough to get enough visitors to detect many, possibly nuanced interactions.”
Claire Vo, co-founder of Experiment Engine, also said that MVT is more difficult to execute because of the extra traffic and resources it requires:
Claire Vo:
“MVT tests require significantly more investment on the technology, design, setup, and analysis side, and certainly full-factorial MVT testing can burn through significant traffic (if you even have the traffic to support this testing method.) This means MVT testing can be a big burden on your conversion “budget”–whether that’s time, people, resources, or internal support.”
A rule of thumb: if your traffic is under 100,000 uniques/month, you’re probably better off doing A/B testing instead of MVT. The only exception would be the case where you have high-converting (10% to 30% CR) lead gen pages.
In addition, if you’re an early stage startup and you’re still doing customer development, it’s too early for MVT. You may end up with the best performing page, but you won’t learn much. By doing everything at once, you miss out on the ups and downs of understanding the behavior of your audience.
That said, there are definitely some high-impact use cases for MVT.
When Should You Use a Multivariate Test?
Multivariate tests are about measuring interaction effects between independent elements to see which combination works best. As Leonid Pekelis put it:
Leonid Pekelis:
“You’d want to use a multivariate test if you really care about seeing if there are interactions between A/B/n tests. For example, you want to see how visitors react to changes on both your homepage and checkout page, compared to visitors seeing just the homepage change, or just the checkout page, or neither.”
Paras Chopra from VWO said he’d use MVT for optimizing several variables, but not expecting a huge lift. More for incremental improvements on multiple elements:
Paras Chopra:
“I’d use multivariate test when I’m doing optimization with several variables, not hoping for a wild swing (that we expect in A/B test). I think the right way is to use A/B test for large changes (such as overhauling entire design) and such. A/B test could be followed up with MVT to further optimize headlines, button texts, etc.”
The Benefits of Multivariate Tests
MVT is awesome for follow-up optimization on the winner from an A/B test, once you’ve narrowed the field.
While A/B testing doesn’t tell you anything about the interaction between variables on a single page, MVT does. This can help your redesign efforts by showing you where different page elements will have the most impact.
This is especially useful when designing landing page campaigns, for example, as the data about the impact of a certain element’s design can be applied to future campaigns, even if the context of the element has changed.
Andrew Anderson, Head of Optimization at Malwarebytes, explained that MVT is used to figure out what the most influential item on the page is and then going much deeper on it:
Andrew Anderson:
“It is not about ‘I want to see what happens with 3 pieces of copy, 4 images, and a small CTA.’ The question should be what matters most, the copy, the image, or the CTA, and whatever matters most I am going to test out 10 versions (and learn something important).”
AB Testing can never tell you influence, MVT can when it is done right. ANOVA analysis gives you mathematical influence, or the relative amount one factor influences behavior relative to others.”
So a big goal of multivariate testing is to let you know which elements on your site play the biggest role in achieving your objectives.
ANOVA? A Quick Definition
ANOVA (analysis of variance) is a “collection of statistical models used to analyze the differences among group means and their associated procedures.”
In simple terms, when comparing two samples, we can use the t-test – but ANOVA is used to compare the means of more than two samples.
If you’re looking to dive deep into ANOVA, here’s a great video tutorial to learn:
So if there are certain use cases for multivariate tests, then there are certain ways to execute them. What are the conditions and requirements of running successful multivariate tests?
Multivariate Testing: How To Do It Right
The one big condition of running MVT: “Lots and lots of traffic,” according to Paras Chopra. Therefore, much of the accuracy in running MVT means understanding traffic needs and avoiding false positives.
Common Mistakes with running MVT
Though many of the common mistakes of MVT aren’t unique (many apply to A/B testing as well), some are specific to multivariate methods. But they’re pretty much as you’d guess:
- Not enough traffic.
- Not accounting for increased chance of false positives.
- Not using MVT as a learning tool.
- Not using MVT as a part of a systemized approach to optimization
1. Not Enough Traffic
We already talked about traffic above, but to reiterate: MVT requires lots of traffic. Fractional factorial methods mitigate this, but there are some questions as to the accuracy of this method.
The increased traffic requirement also presents the question of how long you should expect this test to go. This is especially true if you’re using MVT as a way to throw things at the wall and see what sticks (inefficient).
One thing you should definitely do is estimate the traffic needed for significant results. Use a calculator like this.
Leonid from Optimizely discussed ways to get around the need for crazy amounts of traffic, including the fractional factorial method (we’ll discuss more below):
Leonid Pekelis:
“There’s another approach to reducing the need for more visitors in a multivariate test – examine fewer interactions (e.g. only 2-way interactions). This is where things like fractional factorial designs come in. You can reduce the required number of visitors by quite a lot if you use fractional factorial instead of full factorial, but you only get to see part of the interaction picture. Things get complicated pretty quickly when you look at all the different design methods out there.
There is one other use of multivariate tests if you don’t have tons of traffic: start by running a full factorial just to check that none of your changes interact to break your site, you’ll notice those pretty quickly, and then switch to running A/B/n tests to see which changes outperform their baseline.”
2. Not accounting for increased chance of false positives.
According to Leonid, the most common mistake in running multivariate tests is not accounting for the increased chance of false positives. His thoughts:
Leonid Pekelis:
“You’re essentially running a separate A/B Test for each interaction. If you’ve got 20 interactions to measure, and your testing procedure has a 5% rate of finding false positives for each one, you all of a sudden expect about 1 interaction to be detected significant completely by chance. There are ways to account for this, they’re generally called multiple testing corrections, but again, the cost is you tend to need more visitors to see conclusive results.”
3. Not using MVT as a learning tool.
As we mentioned in a previous article, optimization is really about “gathering information to inform decisions.” MVT is best used as a learning tool. Using it as a way to drive incremental change and throw stuff at the wall is inefficient and takes time away from more impactful A/B tests. Andrew Anderson put it well in an article on his blog:
Andrew Anderson:
“The less you spend to reach a conclusion, the greater the ROI. The faster you move, the faster you can get to the next value as well, also increasing the outcome of your program. What is more important is to focus on the use of multivariate as a learning tool ONLY, one that was used to tell us where to apply resources. One that frees us up to test out as many resources for feasible alternatives on the most valuable or influential factor, while eliminating the equivalent waste on factors that do not have the same impact. The goal is to get the outcome, getting overly caught up in doing it in one massive step as opposed to smaller easier steps, is fool’s gold.”
4. Not using MVT as a part of a systemized approach to optimization
Similarly, many MVT mistakes come from people not knowing what they’re planning on doing, or having a testing plan at all. As Paras Chopra put it:
Paras Chopra:
“The biggest mistake is not knowing what they expect out of an MVT. Are they expecting to see best combination of changes or they want to know which element (headline, button) had the maximum impact?”
Andrew Anderson puts it in perspective, saying if you’re using either A/B or MVT testing just to throw stuff against the wall or to validate hypotheses, this will only lead to a personal optimum (ie ego-fulfillment.) He continues, saying that, “tools used correctly to maximize results and maximize resource allocation for future efforts leads to organizational and global maximum.”
Now, I mentioned above that there were different statistical methods for MVT. There’s a bit of a debate between them. Does it matter?
Full Factorial, Fractional Factorial…Does it Matter?
There are a few different methods of multivariate testing:
- Full factorial
- Fractional factorial
- Taguchi
There’s a bit of an ideological debate between the methods, as well.
Full Factorial Multivariate Testing
A full factorial experiment is “an experiment whose design consists of two or more factors, each with discrete possible values or “levels”, and whose experimental units take on all possible combinations of these levels across all such factors.”
In other words, full factorial MVT tests all combinations with equal amounts of traffic. That means that it is:
- is more thorough, statistically.
- requires a ton of traffic.
Paras Chopra wrote in Smashing Magazine a few years ago:
“If there are 16 combinations, each one will receive one-sixteenth of all the website traffic. Because each combination gets the same amount of traffic, this method provides all of the data needed to determine which particular combination and section performed best. You might discover that a certain image had no effect on the conversion rate, while the headline was most influential. Because the full factorial method makes no assumptions with regard to statistics or the mathematics of testing, I recommend it for multivariate testing.”
Fractional Factorial Multivariate Testing
Fractional factorial designs are “experimental designs consisting of a carefully chosen subset (fraction) of the experimental runs of a full factorial design.”
So fractional factorial experiments test a sample set by showing significant combinations. Because of that, they require less traffic:
Though, an Adobe blog post likened fractional factorial design to a barometer, saying “a barometer measures atmospheric pressure, but its value is not so much in the precise measurement as the notification that there is a directional change in pressure.”
The same article then also said:
“I question how valuable it is to spend 5 months running 1 single test for learnings that may no longer be applicable by the time the test has completed and the data pumped through analysis. Instead, why not take the winnings and learnings of your week-long fractional-factorial multivariate test and then run another test that builds off that new and improved baseline?”
Taguchi Multivariate Testing
This is a bit more esoteric, so it’s best not to worry about it. As Paras wrote in Smashing Magazine:
”It’s a set of heuristics, not a theoretically sound method. It was originally used in the manufacturing industry, where specific assumptions were made in order to decrease the number of combinations needing to be tested for QA and other experiments. These assumptions are not applicable to online testing, so you shouldn’t need to do any Taguchi testing. Stick to the other methods.”
So does it matter?
As mentioned above, most of the debate lies in the murkier statistics of the fractional factorial method. A large amount of the optimizers I talked to said they only recommend full factorial. As Paras explains, “A lot of ‘fractional factorial’ methods out there are pseudo scientific, so unless the MVT method is properly explained and justified, I’d stick to full factorial.”
However, some, like Andrew Anderson, hold that these debates in general are misguided. As he explains:
Andrew Anderson:
“Debating which is better, partial or full factorial, at that point is useless because you are just arguing over what shade of green is one leaf in the large forest. MVT should be used to look for influence and focus future resources, in which case it is just a fit and data accessibility question. Any other use of MVT missed that boat completely and just highlights the lack of discipline and understanding of optimization.”
So does it really matter? I don’t know. If you have enough traffic, I think full factorial is harder to mess up. That said, you’re making business decisions that are time critical, so if a full factorial test will take you 6 months to complete, it’s probably not worth the accuracy.
Conclusion
If you have enough traffic, use both types of tests. Each one has a different and specific impact on your optimization program, and used together, can help you get the most out of your site. Here’s how:
- Use A/B testing to determine best layouts
- Use MVT to polish the layouts to make sure all the elements interact with each other in the best possible way.
As I said before, you need to get a ton of traffic to the page you’re testing before even considering MVT.
Test major elements like value proposition emphasis, page layout (image vs copy balance, etc), copy length and general eyeflow via A/B testing, and it will probably take you 2-4 test rounds to figure this out. Once you’ve determined the overall picture, now you may want to test interaction effects using MVT.
However, makes sure your priorities align with your testing program. Peep once said, “most top agencies that I’ve talked to about this run ~10 A/B tests for every 1 MVT.”
When To Do Multivariate Tests Instead of A/B/n Tests
Aucun commentaire:
Enregistrer un commentaire