ab-testing

Good data analysis is the search for cause: attempting to uncover why something happened. Traffic to the website is low—why? Our email click through rate is improving—is it because we recently redesigned our email template, or because we’re focusing on more direct calls to action? The best way to find these answers is to rely on the same approach that scientists have used for centuries—experimentation.

As technologist Scott Brinker advises: “Experimentation is the gold standard of causation.” A thoughtfully crafted experiment allows you to zero in on the variables that influence your data. Instead of retroactively analyzing your data, you isolate your assumption and design an experiment that will allow you to test it. These tests start with a hypothesis.

State your hypothesis

A hypothesis is a predictive statement, not an open-ended question. A good A/B testing ahypothesis will invite you, through research, to identify a potential solution. Let’s look an example of an experiment that RJMetrics ran on their website.

In a pricing page experiment, RJMetrics’ hypothesis was informed by qualitative data on how visitors were interacting with the web page. They used Crazy Egg to produce a heat map that showed high and low-activity parts of the page:

Screen Shot 2014-02-06 at 2.13.03 PM

Stephanie Liu, front-end developer at RJMetrics and Optimizely’s Testing Hero of the Year, crafted the following hypothesis:

My hypothesis was that moving the button into the white hot scroll map area would cause the design to have a higher conversion rate as compared to the original pricing page. More people would pay attention to the button simply because their eyes would be lingering there longer.

Here’s her original version:

Pricing_variation1a

Here’s her variation:

pricing_variation1b

Stephanie’s experiment proved her hypothesis to be correct, and her improved pricing page resulted in a 310% improvement in conversions on the pricing page—a staggering win, due to diligent use of data and a well-formed hypothesis.

The Inspectable Elements of a Hypothesis

Let’s boil down a hypothesis to its individual components. Data fits into the hypothesis framework in a number of areas.

If _____[Variable] _____, then _____[Result]_____ [Rationale].”

The Variable: A website element that can be modified, added, or taken away to produce a desired outcome.

Use data to isolate a variable on your website that will have an impact on your performance goals. Will you test a call to action, visual media, messaging, forms, or other functionality? Website analytics can help to zero in on low-performing pages in your website funnels.

Result: The predicted outcome. (More email sign-ups, clicks on a call to action, or another KPI or metric you are trying to affect.)

Use data here to determine what you’re hoping to accomplish. How large is the improvement that you’re hoping for? What is your baseline that you’ll measure against? How much traffic will you need to run an A/B test?

Rationale: Demonstrate that you have informed your hypothesis with research: what do you know about your visitors from your qualitative and quantitative research that indicates your hypothesis is correct?

Use data here to inform your prediction: quantitative insights can be very helpful in formulating the “why.” Your understanding of your customer’s intent and frustration can be enhanced with an array of tools like surveys, heat maps (as seen above), and user testing to determine how visitors interact with your website or product.

Strengthening your Hypothesis

Not all hypotheses are created equal. To ensure that your hypothesis is well-composed and actionable, use a few of the following tips. Here are some examples of strong and weak hypotheses:

Strong Hypothesis Weak Hypothesis
“If the call-to-action text is changed to “Complete My Order,” the conversion rates in the checkout will increase, because the copy is more specific and personalized.”

This hypothesis is strong because of its specific variable to modify (CTA text) and rationale, which indicates an understanding of the audience for the page.
“If the call-to-action is shorter, the conversion rate will increase.”

This hypothesis is weak because it is very general, and does not include a rationale for why the proposed change would produce an improvement. What would be learned if this hypothesie
“If the navigation is removed from checkout pages, the conversion rate on each step will increase because our website analytics shows portions of our traffic drop out of the funnel by clicking on these links.”

This hypothesis is strong because it is supported by website analytics data that highlight a high-impact opportunity for streamlining the checkout process.
“If the checkout funnel is shortened to fewer pages, the checkout completion rate will increase.”

This hypothesis is weak because it is based on the assumption that a shorter process is better, but does not include any qualitative or quantitative data to support the prediction.

A strong hypothesis is:

Testable. Can you take action on the statement and test it? Keep your predictions within the scope of what can be acted upon. Avoid pulling multiple variables into the statement—a more complex hypothesis makes causation more difficult to detect. For instance, don’t change copy on multiple parts of a landing page simultaneously.

A learning opportunity, regardless of outcome. Not every experiment produces an increase in performance, even with a strong hypothesis. Everything you learn through testing is a win, even if all it does is inform future hypotheses.

That brings us to our next tips for using hypotheses:

Hypothesize for every outcome. One of our solutions partners, Blue Acorn, mentioned a hypothesis best practice that we think is fantastic. To ensure that every experiment is a learning opportunity, think one step ahead of your experiment. What would you learn if your hypothesis is proven correct or incorrect in the case of a variation winning, losing, or a draw?

Build data into your rationale. You should never be testing just for the sake of testing. Every visitor to your website is a learning opportunity, this is a valuable resource that shouldn’t be wasted. RJMetrics recently wrote a tutorial on how to use data to choose and prioritize your tests, you can check it on the Optimizely blog.

Map your experiment outcomes to a high-level goal. If you’re doing a good job choosing tests based on data and prioritizing them for impact, then this step should be easy. You want to make sure that the experiment will produce a meaningful result that helps grow your business. What are your company-wide goals and KPIs? Increasing order value, building a revenue stream from existing customers, or building your brand on social media? If your experiments and hypotheses are oriented towards improving these metrics, you’ll be able to focus your team on delving into your data and building out many strong experiments.

Document your hypotheses. Many website optimization experts document all of the experiments they run on their websites and products. This habit helps to ensure that historical hypotheses serve as a reference for future experiments, and provide a forum for documenting and sharing the context for all tests, past, present, and future.

Now, Build Your Own

A hypothesis is a requirement for anyone running A/B tests and experiments on their website. When you build your own hypotheses, remember to:

  1. Clearly define the problem you’re trying to solve, or metric you’re looking to improve
  2. Bring quantitative and qualitative data into the hypothesis
  3. Test the hypothesis to strengthen and ensure it is actionable
  4. Look at every experiment as a learning opportunity

If you need some extra help check out our ebook, Building your Company’s Data DNA for more tips on how to build data-driven hypotheses.

mobile-app-changes

Getting found in the iOS app store is a challenge, with more than one million active apps vying for users’ attention. App publishers and developers have a number of obvious marketing tools at their disposal, like advertising and pay-per download, to get more people to notice their mobile apps. But these are costly and not for everyone.

Beyond the obvious advertising tools, the iOS app store has another, often overlooked, way to promote discovery: app price changes. When a publisher or developer lowers the price of a paid app it gets added to Apple and third party RSS feeds that are distributed to thousands of sites and twitter feeds focused only on promoting apps that have gone on sale or have recently become free.

How it works

This marketing tool, more akin to merchandising, requires little to no budget but, according to our analysis of all iOS apps during most of 2013, it has a significant impact on positioning in Apple’s Top Paid and Top Grossing ranks. This directly translates into better visibility, downloads and revenue.

In fact, as can be seen in the graph below, compared to paid apps that never changed their prices, paid apps that made such changes (both increases and decreases) grew the average number of days they were ranked by 21% in Top Paid (+9 days) and 70% in Top Grossing (+16 days). These apps also improved their average rank by 20% in Top Paid (-45 positions) and 19% in Top Grossing (-46 positions). These improvements were not only for the most popular iOS apps, but also for less established new apps and poorly performing apps that have been around for a while.

the impact of price changes vs no change

The number of price changes, whether increases or decreases (including to $0) also matters. 1 or 2 changes during a year provides very limited improvements. But when changes are made once per month (12 total), improved rank and the number of days ranked healthily. Increase that number to 1 per week (52 total) or more and that’s when developers started to see the largest improvements to app ranks and thus downloads.

Applying this to your app

Here are the key rules that mobile app publishers and developers should follow when developing their price marketing strategy:

Repeat Frequently

All paid apps should look to go on sale, on average, at least once per month. With the corresponding price increase, that makes 24 price changes per year. More experienced app developers and marketers can look to do more to maximize downloads, including intraday changes to target specific countries or types of users, but 1 per month is a good start for most apps.

Allow Settling Time

Price changes can take anywhere from 20 minutes to more than 15 hours to spread throughout iTunes’ storefronts (New Zealand is usually one of the first then it follows time zones to reach European storefronts and the US). In addition, it can take time for users to discover the new price, either directly or through a third party site like AppShopper. So unless you are looking to make multiple price changes a day, which can be rewarding but requires constant attention and/or the right tools, most publishers should let their app’s sale breathe for 48 to 72 hours.

Focus on Down Cycles

Given the cyclical nature of downloads and ranks, price changes should generally not be made when the app is experiencing a growth spurt. Instead, the price change should be timed with an app’s slowing downloads or sagging rank.

React to Competition

If your app is a soccer app at $2.99 and EA’s FIFA 2014 goes from $4.99 to $0.99, you need to react immediately, in order protect your positioning and sales. If this example does not directly apply to you, remember that competitors are not just direct competitors. They may also be apps ranked just above you in your genre or category, or those appearing before you in key searches on iTunes.

Avoid Predictability

Varying the times, days of the week and the amounts of your price changes will avoid predictability that could be gamed by both competitors and users.

Test Often

Every price change should be an opportunity to test a new price and new price steps. That may not always be possible if you are at $0.99 and going free. But even then you should be testing various target prices (the price you go to after a sale). Here are examples of variations in price changes:

  • The price of your app is lowered to varying tiers in 1 or 2 steps (e.g. $3.99 -> $0.99 or $3.99 -> $0.99 -> $0.00)
  • Then the price is increased in 1-3 steps (e.g. $0.99 -> $3.99, $0.99 -> $4.99 -> $3.99, $0.99 -> $1.99 -> $3.99)

Pricing changes are a simple, effective way to get your app in front of people. You can make these changes yourself, or if you’re looking for some extra assistance, talk to us. The Loadown can help you automate this exact type of optimization.

mobile

Americans now spend more time interacting with their smartphones than watching TV. And when people are watching TV, there’s a growing tendency to do so with a device in hand. 41% of all screen time is now multi-screen. Finally, and most importantly, the mobile market is growing. By the end of 2014, mobile commerce is expected to be a $114 Billion market.

Continue reading

landscape

It’s not news that businesses that use data get better results. But while we know this works in business, it’s still rare to see analytical rigor applied in an area like charity. This seems short-sighted. If businesses that use data see 33% more revenue and 12x profit growth, what kind of impact could data have in the work of helping humanity?

If you’re a certain kind of data nerd with aspirations of making the world of better place, you create a charity like GiveDirectly. Two friends pursuing advanced degrees in economic development at Harvard and MIT founded the organization in 2008 when they were looking for the best ROI for their own donated money.

GiveDirectly does something pretty radical for a charity: it raises money, and then gives it away to poor people in Kenya and Uganda. No strings. No requirements. Very little overhead.

Continue reading

beast

Over the past four months I have improved RJMetrics’ landing page conversion rates by as much as 74.9% and increased click through rates on our homepage by 88.2%. Pretty impressive, right? Truth is, I wasn’t always a conversion optimizer. Here’s how I went from a testing newbie to a conversion rate optimization beast.

How I learned my baby was ugly

One of my first tasks at RJMetrics involved building a new landing page. With a furrowed brow, I slaved over my Photoshop file, overhauling the textured backgrounds with flat colors and concentrated on inching images just a fewww more pixels to left. Proudly, I unveiled my baby to our designer, received the stamp of approval, and began coding.

Continue reading

Note: This post originally appeared as a guest feature on TechCruch to announce our new Test Significance website.

At RJMetrics, we believe in data-driven decisions and that means we do a lot of testing.  However, one of the most important lessons we’ve learned is this: not all tests are worth running.

In a data-driven organization, it’s very tempting to say things like “let’s settle this argument about changing the button font with an A/B test!”  Yes, you certainly could do that.  And you would likely (eventually) declare a winner.  However, you will also have squandered precious resources in search of the answer to a bike shed question.  Testing is good, but not all tests are.  Conserve your resources.  Stop running stupid tests.

The reason for this comes from how statistical confidence is calculated.  The formulas that govern confidence in hypothesis testing reveal an important truth:

Tests where a larger change is observed require a smaller sample size to reach statistical significance.

(If you’d like to dig into why this is the case, a good place to start is Wikipedia’s articles on hypothesis testing and the binomial distribution.)

In other words, the bigger the impact of your change, the sooner you can be confident that the change is not just statistical noise.  This is intuitive but often ignored.  And the implications for early-stage companies are tremendous.

If your site has millions of visitors per month, this isn’t a big deal.  You have enough traffic to hyper-optimize and test hundreds of small changes per month.  But what if, like most start-ups, you only have a few thousand visitors per month?  In these cases, testing small changes can invoke a form of analysis paralysis that prevents you from acting quickly.

Consider a site that has 10,000 visitors per month and has a 5.0% conversion rate.  The table below shows how long it will take to run a “conclusive” test (95% confidence) based on how much the change impacts conversion rate.

Starting
Conversion
Rate
New
Conversion
Rate
Total
Participants
Required
Test
Duration
Required
5.00% 5.20% 185,926 1.5 years
5.00% 5.40% 47,340 5 months
5.00% 5.60% 21,420 2 months
5.00% 5.80% 12,262 37 days
5.00% 6.00% 7,982 24 days
5.00% 6.20% 5,638 17 days
5.00% 6.40% 4,210 12 days
5.00% 6.60% 3,276 10 days
5.00% 6.80% 2,630 8 days
5.00% 6.90% 2,378 7 days
5.00% 7.00% 2,162 6.5 days
5.00% 7.20% 1,814 5.4 days
5.00% 7.40% 1,548 4.6 days
5.00% 7.60% 1,338 4.0 days
5.00% 7.80% 1,170 3.5 days
5.00% 8.00% 1,034 3.1 days

(Data assumes a Bernoulli Trial experiment with a two-tailed hypothesis test and all traffic being split 50/50 between the test groups.)

As you can see, your visitors are precious assets.  Too many start-ups will run that “button font” test, expecting full well that in a best-case scenario it will only impact conversion by a quarter of a percent.  What they don’t appreciate up-front is that this may block their ability to run certain other tests for a year and a half (assuming they don’t end the test prematurely).

When you can’t run many tests, you should test big bets.  A homepage redesign.  A pricing change.  A new “company voice” throughout your copy.  Not only will these tests potentially have a bigger impact, you’ll have confidence sooner if they do.

I found myself making this argument a lot recently here at RJMetrics, so I developed a tool to calculate the required population size for a significant test.  We’ve shared that tool with the world free of charge at Test Significance.  Just input your current conversion rate and your desired confidence interval and it will generate a table like the one above.

TestSigScreenshot-Better

We hope this tool helps a few companies out there learn the lessons we have about when to test and what to expect in terms of finding a conclusive result.