The 5 Most Common Data Analysis Mistakes

Data-driven decisions are the backbone of modern online businesses. As too many learn, however, the only thing worse than not using your data is using it incorrectly. Here are five mistakes that we’ve seen companies make (before getting their ecommerce analytics on the right track with RJMetrics).

1. Not Accounting for Age

Aggregate analysis of customer behavior can be extremely misleading. For example, consider a company that has acquired customers through Google Ads for years, but only recently started spending money on Sponsored Tweets. An analysis of “customer lifetime spending by acquisition source” would likely show that Google-acquired customers, on average, have spent more than Twitter-acquired customers. But, this doesn’t mean that Twitter is an inferior source of leads—it’s just a misleading way of slicing the data. Of course the average Google customer has spent more: they have been a customer for a longer amount of time and had more opportunities to make repeat purchases.

This is an extremely basic example, but variations on this oversight lead to a surprising amount of confusion in companies of all sizes. To avoid this problem, use cohort analysis to segment customers by common attributes like when they made their first purchase. This allows for apples-to-apples comparisons of customer cohorts that can then be expanded into more insightful and actionable analyses.

CohortAnalysis

A sample cohort analysis

 

2. Not Planning Next Steps

Before you invest time in running an experiment or conducting an analysis, it’s important to understand how the results will impact your behavior. There will always be countless ways to slice and dice your data, and the best way to avoid analysis paralysis is to focus on metrics that will drive you to act.

To determine if a metric is actionable, simply consider the possible results of your analysis and ask yourself how your strategy or behavior will change based on the results. If it won’t, maybe your time would be better spent elsewhere.

 

3. Ignoring Test Significance

Whether you’re running an A/B test or performing ad hoc analysis on your data, remember that the sample size of your data set is key to the significance of your analysis (i.e., how likely it is that your observation is representative of the total population).

Many tests are not worth running, and you can save lots of time being realistic about when data may or may not hold the answers. Sites like test significance can help you determine how costly a given test might be and what it takes to reach significance.

Online Test Significance Tool

Online Test Significance Tool

 

4. Stopping at the Surface

Imagine that a company is studying which referral sources yield the most valuable customers. If they find that one source is superior, they have discovered a correlation between referral source and Customer Lifetime Value (CLV). However, this does not mean that referral source is causing CLV to be higher for those customers—all it means is that they are linked.

Unseen “lingering” variables could be the true reason for the correlation. Perhaps your online store is not well optimized for a female audience and the best-performing referral source is simply the one that refers you the highest percentage of male shoppers. In this case, blindly shifting all of your spend to that referral source could backfire if their demographics shift.

The smarter move would be to redesign your homepage, which would lift CLV across all channels. The only way to discover this opportunity, however, is to dig deeper within your data. Look at all of the characteristics of your customers – not just the ones tied directly to marketing spend – and you may discover far more meaningful metrics to act upon.

 

5. Modeling Growth Projections By Percentage

I worked in venture capital prior to starting RJMetrics, and I quickly learned that most company financial models (which forecast future growth) are exercises in fantasy. One of the main reasons is that so many models are driven by inputs like “percentage growth per month.” In these models, a slight change in that  parameter can be the difference between a billion-dollar company and a dud.

This blanket “percentage growth” methodology says nothing about what’s driving that growth or what such growth actually means in terms of the number of customers added, where they came from, and how they were monetized.

Rather than an exercise in fantasy, make it a model that demonstrates the fundamental economics of scaling your business. Build a “bottom-up” model by using more granular inputs that are specific to your business model. This will lead to a more productive conversation and help set you apart from the pack.

 

 

RJMetrics Round-Up (01/31/2013)

It’s been a busy new year at RJMetrics!  Here’s some of what’s been happening behind the scenes:

  • Our co-founders Jake and Bob just returned from a trip to San Francisco, where they attended an investor’s CEO conference and visited with several of our wonderful clients.
  • Our staff is growing! Our new Director of Marketing starts Monday, a new developer has just signed on, and we’ve accepted two Drexel co-ops for the summer session.
  • Our CEO Bob had another popular guest post in TechCrunch last week, which you can read here.
  • Bob is also featured in the new book “Hacking PR” by Sean Blanda. You can learn more about the book here (and use code RJMETRICS for 20% off).
  • Our new microsites Cohort Analysis, Churn Rate, Test Significance, and Query Mongo are educating legions of visitors about key topics related to data and analytics.

Stay tuned to this blog for more product announcements, team news, and original research from RJMetrics.

 

RJMetrics Winter 2013 Hackathon Results

After the last RJMetrics hackathon, I didn’t think our team could possibly cram more innovation into a 24-hour period.  They just did.

The Projects

Team members unveiled some amazing projects, including:

Spotlight search in our new dashboard UI to provide users with fast access to charts, dashboards, and trends.

Customizations to Zendesk to make our customer support exchanges more streamlined.

customized sales video generator to provide a personal touch to our sales prospects.

Shaun Presents His Video Generator

Shaun Presents His Video Generator

BallerBoard, a TV display engine that automatically shows stats from RJMetrics, Twitter, and other services in a format that’s easy on the eyes.

BallerBoard

An End-User Query Browser to allow advanced users to query their data warehouses directly using SQL syntax.

A deployment system for MySQL stored procedures, which will greatly increase the number of analyses we can run natively in MySQL.  Last Hackathon’s Median/Percentile feature will be deployed using this system.

A trend-line overlay system that will allow users to fit regression lines to their data and forecast future data points based on these models.

TrendLine

A system for self-auditing and approving RJMetrics Trend/Metric definitions through our UI.

A new concept and 3D rendering of a new RJMetrics conference booth.

Conf

An improved RJMetrics deployment system.  This is an extension of our new AWESOM-O deployment system and its client application Butters.

Drastic improvements to our physical office environment, including a new reception area and accent walls throughout the office.

Working Capybara integration tests to monitor our UI.

 

The Results

Francis “Buck” Ryan took the crown this time around for his work on trend-line overlays.    This was a suggestion that came directly from our feature request page.  Existing users can keep an eye out for it in the beta tests of our new dashboard UI.

Buck will enjoy the grand prize: $500 cash to be spent all in one night.

I can’t wait for Spring.

 

 

Why Many A/B Tests Aren’t Worth It

Note: This post originally appeared as a guest feature on TechCruch to announce our new Test Significance website.

At RJMetrics, we believe in data-driven decisions and that means we do a lot of testing.  However, one of the most important lessons we’ve learned is this: not all tests are worth running.

In a data-driven organization, it’s very tempting to say things like “let’s settle this argument about changing the button font with an A/B test!”  Yes, you certainly could do that.  And you would likely (eventually) declare a winner.  However, you will also have squandered precious resources in search of the answer to a bike shed question.  Testing is good, but not all tests are.  Conserve your resources.  Stop running stupid tests.

The reason for this comes from how statistical confidence is calculated.  The formulas that govern confidence in hypothesis testing reveal an important truth:

Tests where a larger change is observed require a smaller sample size to reach statistical significance.

(If you’d like to dig into why this is the case, a good place to start is Wikipedia’s articles on hypothesis testing and the binomial distribution.)

In other words, the bigger the impact of your change, the sooner you can be confident that the change is not just statistical noise.  This is intuitive but often ignored.  And the implications for early-stage companies are tremendous.

If your site has millions of visitors per month, this isn’t a big deal.  You have enough traffic to hyper-optimize and test hundreds of small changes per month.  But what if, like most start-ups, you only have a few thousand visitors per month?  In these cases, testing small changes can invoke a form of analysis paralysis that prevents you from acting quickly.

Consider a site that has 10,000 visitors per month and has a 5.0% conversion rate.  The table below shows how long it will take to run a “conclusive” test (95% confidence) based on how much the change impacts conversion rate.

Starting
Conversion
Rate
New
Conversion
Rate
Total
Participants
Required
Test
Duration
Required
5.00% 5.20% 185,926 1.5 years
5.00% 5.40% 47,340 5 months
5.00% 5.60% 21,420 2 months
5.00% 5.80% 12,262 37 days
5.00% 6.00% 7,982 24 days
5.00% 6.20% 5,638 17 days
5.00% 6.40% 4,210 12 days
5.00% 6.60% 3,276 10 days
5.00% 6.80% 2,630 8 days
5.00% 6.90% 2,378 7 days
5.00% 7.00% 2,162 6.5 days
5.00% 7.20% 1,814 5.4 days
5.00% 7.40% 1,548 4.6 days
5.00% 7.60% 1,338 4.0 days
5.00% 7.80% 1,170 3.5 days
5.00% 8.00% 1,034 3.1 days

(Data assumes a Bernoulli Trial experiment with a two-tailed hypothesis test and all traffic being split 50/50 between the test groups.)

As you can see, your visitors are precious assets.  Too many start-ups will run that “button font” test, expecting full well that in a best-case scenario it will only impact conversion by a quarter of a percent.  What they don’t appreciate up-front is that this may block their ability to run certain other tests for a year and a half (assuming they don’t end the test prematurely).

When you can’t run many tests, you should test big bets.  A homepage redesign.  A pricing change.  A new “company voice” throughout your copy.  Not only will these tests potentially have a bigger impact, you’ll have confidence sooner if they do.

I found myself making this argument a lot recently here at RJMetrics, so I developed a tool to calculate the required population size for a significant test.  We’ve shared that tool with the world free of charge at Test Significance.  Just input your current conversion rate and your desired confidence interval and it will generate a table like the one above.

TestSigScreenshot-Better

We hope this tool helps a few companies out there learn the lessons we have about when to test and what to expect in terms of finding a conclusive result.

 

E-Commerce Churn Rate

Some online businesses, like SaaS companies, spend a lot of time thinking about their churn rate. (This is typically the percentage of customers who unsubscribe from their service in a given month.) For subscription businesses, a low churn rate can be the difference between life and death.

Companies with non-subscription business models, however, typically don’t think about customer behavior in terms of churn. E-commerce players, for example, think about long-term customer relationships in terms of conversion rates and repeat purchase rates. There is rarely a clear distinction between a “current customer” and a “churned customer” in these analyses.

That is, until recently. Every month we’re seeing more and more e-commerce leaders adding “churn rate” into their RJMetrics dashboards.

How can an e-commerce business measure churn if customers never “unsubscribe?” There are a number of approaches that make sense, and typically they depend on how your company plans to act on the data.

One of the most popular methods is to set a cutoff date after which, if a customer has not made a purchase, they are considered to have “churned.” The choice of this date can be arbitrary, or it can be influenced by things like time between orders, repeat purchase probability, and cohort analysis of historical data.

Once you have defined populations of “active” and “churned” customers, you can pursue a number of new analyses and strategies. Here are some tactics we’ve seen:

  • Segment things like referral sources and product categories by percentage of customers churned. This can tell you more about where your most loyal customers come from and what they tend to buy, which can inform marketing and merchandising decisions. This is probably most valuable as part of a cohort analysis.
  • Identify populations of customers who are “about to churn” and send them special promotions or offers to encourage a purchase. While these groupings are definitionally as arbitrary as your churn threshold, instituting this practice on a regular basis will ensure that you are consistently reaching out to new populations of at-risk customers (since customers from previous batches who did not purchase will have churned and those who did purchase will no longer be at-risk).
  • Monitor the health of your business by tracking churn rate over time. Changes in the percentage of customers who move from “active” to “churned” in a given month can be an directional indicator of changes in customer loyalty or behavior.

Giving lapsed customers a definitive “churn” event changes the slow, unpredictable fade-away of an e-commerce customer into a sharp step function that can be monitored, quantified, and acted against. It can simplify goals and clarify vision.

For more on churn rate. check out www.churn-rate.com. To give this a try on your own data, you can try RJMetrics free for 30 days.

 

Return of the RJMetrics Hackathon

This Thursday at Noon, we will kick off our second seasonal RJMetrics Hackathon.

Our previous hackathon was an enormous success. It caused major disruption in our development pipeline. Many of that hackathon’s projects are currently being beta tested and will hit our production codebase soon.

It also spawned such live features as:

  • QueryMongo.com, which topped Hacker News and helps dozens of coders (including our team members and customers) every day.
  • A snazzy new sales video, which is being A/B tested on our homepage.
  • And much, much more.

Most importantly, everyone had a great time and couldn’t wait for the next one.

The Prize

Last time, our winners enjoyed a lavish dinner at Del Frisco’s Double Eagle Steakhouse. This time, we’ve upped the ante.

This Hackathon’s prize is inspired by the classic film Brewster’s Millions.

The winning team will be given $500, with the caveat that they must SPEND IT ALL IN ONE NIGHT. You can pick any night you want and spend it however you want, but that money has to be gone by the time the sun comes up. Winners are encouraged to document their shenanigans.

Taking Suggestions

Got a suggestion for something our team should take on at the Hackathon? Let us know by emailing support@rjmetrics.com with your suggestions. They will be passed on to our entire team.

Check back next week for results!

Lifetime Revenue Cohorts

There are a lot of different ways to look at your data in RJMetrics, and we know that interpretation and understanding are just as important as calculation and visualization. So, I’m writing a series of blog posts where I will do a deep dive into some of our analyses and visualizations.

The first in the series is the lifetime revenue cohort analysis.

What does lifetime revenue cohort analysis mean?

This chart shows the cumulative spending per user for a period of time after they are acquired. Cohorts of users are split up by their acquisition month.

For example, the orange line above shows the average for users who were acquired in November 2011. The first data point means that in their first month, users who were acquired in November spent an average of about $200. The second data point means that by the end of their second month, these users had spent an average of about $240. Their average spending in month two was approximately $40 (240 – 200).

The different lines represent different cohorts of users. The green represents the users that were acquired in December, and the blue is users that were acquired in October.

Why is this important?

This kind of cohort analysis can be useful for several different purposes, but the most immediate benefit is often better customer acquisition decisions.

Many companies limit their marketing spend to channels that yield profitability on a customer’s first purchase. These companies will pay to acquire customers through a given channel as long as that their average first purchase yields more gross margin than it costs to acquire them. The problem with this approach is that it often results in an underinvestment in growth. If your competitors are marketing based on a deeper understanding of buying behavior, they will outgrow you.

The lifetime revenue cohort analysis helps you to understand the consequences of expanding your customer acquisition spending, and it provides an easy way to convey this to the rest of your team. If future customers behave like existing customers, then acquiring customers for a higher CPA will result in a predictable payback period. Depending on the cash position of the business, you can define what payback period you are comfortable with, find the relevant spot on the chart, and spend accordingly.

Additionally, you can use this analysis to see if you are getting better at onboarding, engaging, and generating revenue from the users you acquire.  For example, this cohort analysis is a great way to see if a free shipping promotion for new users resulted in repeat buyers or one time purchasers that never come back.

How will this vary for different business models?

For most businesses, the lifetime revenue cohort analysis chart will show a large amount of spending in the initial period and then increase more slowly over time.

That initial spike is due to the fact that customers are more likely to make their first purchase soon after they are acquired than at any other time. In cases where the acquisition event itself is a purchase, 100% of customers make a purchase in their first period. In cases where registration can happen before purchases, this effect is less drastic. As an example, Groupon would likely have a much lower initial jump than Amazon, because many of the people who sign up for Groupon don’t make a purchase right away.

Unless there are a high number of refunds, this chart will slope up and to the right after the initial jump. The rate of growth tends to decrease over time because customers are usually most active when they first sign up. This causes the average to drop because the number of people in the cohort stays constant regardless of how many come back to buy more.

In subscription businesses, the slope will decay less aggressively than in businesses where people make one-off purchases. Occasionally, a subscription business will actually have a slope that increases over time. It is rare to see this, but it is a great signal for the business when it happens. This does not mean that there are zero churning customers, but rather that upgrades for customers that stay more than make up for the customers that leave.

How is this calculated?

There are two simple inputs to this calculation: how many members are in the cohort (which never changes), and how much revenue those members generated in the given period.

To determine the members in the cohort, we count the number of users who were acquired in the period in question. An acquisition can be a first purchase, account creation, newsletter sign up, or some other event.

The revenue calculation is a bit more complicated.  We want to sum revenue for orders that were placed by members of this cohort and took place within a fixed time period from their acquisition date (ie the first three months).

Finally, we divide the revenue by the number of members in the cohort for each time period in the chart and add this value cumulatively over time.

What are the variations of this chart?

There are many different kinds of useful cohort analyses.  The most common variation is filtering by user acquisition source. For example, you might want to look at this chart for customers who came from organic search, paid search, or an affiliate program.

This will help you understand if the customers from one acquisition source are more loyal or valuable than another. Thrillist’s subsidiary JackThreads used this analysis to understand that one of their most expensive acquisition sources was actually its most profitable. After learning this, Thrillist shifted their marketing budget to the more expensive acquisition source and accelerated their growth. Read our case study about Thrillist and Jackthreads.

Another way to look at the data is with an incremental, rather than cumulative, data perspective.  This shows the incremental amount that an average user spends in each month after they are acquired.  This is useful for forecasting the amount of repeat purchases you will get from existing users.

We can look at this with other things besides revenue as well.  Some examples include margin as well as non financial metrics like invites, votes, or messages.

Conclusion

Lifetime spending cohorts are a powerful way of looking at at your customers’ buying behavior.  Stay tuned for more information on how to use and interpret your metrics.  

MySQL to MongoDB Query Translator

A few weeks ago, the first-ever RJMetrics hackathon took place at our Philadelphia headquarters. I decided to throw my hat into the ring with a project I’d been thinking about for a while: a MySQL to MongoDB query translator.

This was a unique challenge because MongoDB and MySQL are very different technologies that store data in very different ways. To some, translating between them might seem like a non-sequitur. However, I knew there was a use case because of my personal experience learning MongoDB. I would often think about queries in terms of SQL syntax, and a translator like this would have greatly softened the learning curve.

The final product is available at our Query Mongo site, and I encourage you to give it a try. It’s not perfect, but we hope it will be a helpful learning tool for the many people who have SQL experience and are getting started with MongoDB.

In this blog post, I’ll provide some insights into how this tool works.

Continue reading

RJMetrics is Hiring a Growth Hacker

We’re hiring someone to help us capture the tremendous opportunity in front of us. This person will play a pivotal role in defining and executing on our growth strategy. We are moving very quickly with this position, so please apply before the end of the day on Friday December 14th if you are interested.

A significant part of this job will be analyzing different marketing channels and strategies to determine best fit and ROI. Day to day responsibilities will include content marketing, quantitative data analysis, paid advertising, funnel optimization, and partner management.

We want this person to be comfortable contributing on all aspects of marketing-related projects and able to work autonomously.

Read the full job description and apply here.

RJMetrics Round-Up (11/28/2012)

The hits keep coming at RJMetrics!  Here’s some of the recent progress we’ve made:

Product Updates:

  • Our new chart builder made its public debut. All users can now participate in beta testing this tool. Read more about it in our documentation.
  • We changed the font throughout our dashboard. Say hello to Proxima Nova.
  • We have expanded our time zone support. Charts with a time frame relative to “right now” will be calculated against your time zone, which administrative users can change in the settings page.
  • Email summaries now include a warning if they contain stale data – that is, data for a time period that RJMetrics has not finished calculating. These email summaries are automatically resent once that data is current.
  • Our “Cohort Analysis 101″ guide is now live at www.cohortanalysis.com.
  • We have increased the parallelization of many of our calculations, resulting in significant speed boosts.
  • We have updated our integration to support many of the improvements to the Google Analytics API recently – specifically allowing the visitors trend to be grouped across many more dimensions.
  • There are now fewer constraints on restriction sets for repeat event probability charts.
  • We’ve sped up the dashboard-wide “change all dates” tool.
  • We’ve sped up the auto-complete drop downs in the chart builder.
  • We made major improvements to file uploader, including auto-detection of the file structure for new uploads.
  • We have begun beta testing a data mapping tool that uses zipcode information as an input – stay tuned for more on this.
  • We finished production of the promotional video that was created at our last hackathon – it can be viewed at https://www.rjmetrics.com/index2

Company News:

  • Our CEO Robert J. Moore has been busy getting the word out:
  • We’re currently interviewing candidates from Drexel University’s Co-Op program for participating in next year’s RJMetrics co-op.