The Best Method for Cohort Analysis in Google Analytics

Here at RJMetrics, we help online businesses to make smarter decisions using their data. Time and time again, we see customers gaining valuable new insights from cohort analysis. We also recognize that just about everyone on the web is using Google Analytics.

So, wouldn’t it be awesome if you could conduct a cohort analysis in Google Analytics? We thought so.

This article outlines the best way to enable analyzing custom cohorts of all sizes in Google Analytics by only using up a single custom variable slot.

The Original Hack

A quick search revealed some prior art, such as Dan Hill’s great article on Hacking a Cohort Analysis with Google Analytics. Dan’s method works great: use two of the five custom variables in Google Analytics to store the month and year of that user’s cohort. Then you can build custom filters to only look at cohorts for a specific year or month.

As Dan shows, tagging your users with these details is simple. Just push two extra lines of information to Google Analytics as part of the standard javascript call tracking their pageview.

//

This works well, but we’ve learned that sometimes a monthly or yearly cohort analysis just isn’t enough. In RJMetrics, we allow our customers to conduct cohort analyses at daily, weekly, monthly, quarterly, and yearly levels. Depending on the data set, any or all of these can prove extremely valuable. We set out to enhance Dan’s hack so that Google Analytics users could have same level of detailed analysis.

The Enhancement

With only five custom variable slots available, storing enough information to place any given user in up to six cohorts (year/quarter/month/week/day/hour) seemed unrealistic. That is, of course, until we realized two critical facts:

  • Any single custom variable can store up to 128 characters
  • Google Analytics allows you to create filters on these fields using regular expressions

In other words, if we could represent all of the necessary cohort data in one long string that followed a predictable pattern, we could later use regular expressions to isolate specific cohorts based on the contents of a single custom variable.

Below, we outline a simple syntax for building this “cohort identifier string.” We decided on this syntax because it will fit in the 128-character limit of the custom variable and is still human-readable. We will be storing the following information in this order:

Y: Year (4 characters)
Q: Quarter (1 character, 1 – 4)
M: Month (2 characters, 01-12)
WY: Week Year (4 characters)
WM: Week Month (2 characters, 01-12)
WD: Week Day (2 characters, 01-31)
D: Day (2 characters, 01-31)
H: Hour (2 characters, 00-23)

Here is an example of the custom cohort variable string for a customer in the 3pm August 23rd, 2012 cohort:

Y:2012;Q:3;M:08;WY:2012;WM:08;WD:19;D:23;H:15

As we’ll show in a minute, building a regular expression to match any cohort you’d like from a string like this is extremely simple.

(A note on weekly cohorts: the day, month, and year used to represent a week are separate from the day, month, and year used to represent… well… days, months, and years. This is because we define a week based on the calendar date of its first day (traditionally a Sunday, but you could adjust your code to use any weekday you’d like). Since the Sunday of a given week could exist in a different month or year than other days of that week, we can’t rely on the month or year associated with the other cohorts to be the same for the week.)

We build the cohort string on the backend using PHP, although this would be simple enough to implement in any language of your liking.

//

The Analysis

As before, we assign the custom variable in the Google Analytics javascript tag by adding one simple line (note that cohort string is coming from your backend or templating system):

//

From here, doing the cohort analysis in Google Analytics is a piece of cake. Just click on “advanced segments”, select “custom segments,” and click “new custom segment.”

Create a title for the cohort you wish to create, for example ‘Week of 07/29/2012 Cohort.” In order to get these customers, we should select our custom cohort variable from the dropdown menu, and use the RegExp match option.

 

Here is the general purpose regular expression. In order to filter the customers, just add the desired cohort information within the parentheses after the appropriate colon.

(Y:).*(Q:).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

For example, to view the 2012 cohort, you’d simply add a “2012″ after the “Y:”:

(Y:2012).*(Q:).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

To view the Q1 2012 cohort, you’d also add a “1″ after the “Q:”:

(Y:2012).*(Q:1).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

To view the cohort that joined on February 14, 2012 you’d use the following:

(Y:2012).*(Q:).*(M:02).*(WY:).*(WM:).*(WD:).*(D:14).*(H:)

To find the cohort for the Week of 07/29/2012, we would use the following:

(Y:).*(Q:).*(M:).*(WY:2012).*(WM:07).*(WD:29).*(D:).*(H:)

It’s that easy!

Special Considerations

When choosing what date to use for bucketing a given visitor into a cohort, the choice is up to you. If you’re montioring web traffic and want to see the rates at which people come back, you can simply use the timestamp of their first visit. If you’re looking to conduct cohort analysis on the actions of registered users, you could assign that user’s registration date. Or, if you’re not sure, you could use a second custom variable slot and do both.

The key, however, is to start now. As with all things Google Analytics, custom tracking will only work for users on a going-forward basis. If you’d like to run cohort analyses that go back throughout the history of your entire business, maybe you should give RJMetrics a spin.

New Google Plus Data Shows Weak User Engagement

Google CEO Larry Page recently announced that Google Plus crossed over the 100 million user mark and continues to see strong user growth.

Despite these strong numbers, however, the service continues to be pummeled in the press. Many outlets have claimed that engagement is poor and that growth is only fueled by Google forcing membership upon users of its other products.

Rather than rely on third-party reports, we decided to pull publicly available data on a random population into an RJMetrics online dashboard and see for ourselves.

Here are some of our most interesting findings:

  • The average post has less than one +1, less than one reply, and less than one re-share.
  • 30% of users who make a public post never make a second one. Even after making five public posts, there is a 15% chance that a user will not post publicly again.
  • Among users who make publicly-viewable posts, there is an average of 12 days between each post
  • A cohort analysis reveals that, after a member makes a public post, the average number of public posts they make in each subsequent month declines steadily. This trend is not improving in newer cohorts.

How We Did It

We began by selecting a population of 40,000 random Google Plus users. For each user, we downloaded their entire public timelines (which consist of all publicly-visible activities for that user). Only one third of the users in our population had any public activity, so this sub-set of the population is the main focus of many of our statistics.

Once we had the data, it was a snap to upload it to RJMetrics and pull the insights seen here with just a few clicks.

Since we are looking at public data exclusively, we want to point out that this data is not necessarily reflective of the entire population of users. These are simply insights into the public-facing actions of Google Plus users based on a population that is known to post publicly.

Repeat Posters

Once a user has made one public post, the chances that they will make a second post are quite strong: around 70%. After that, however, Google Plus does not perform as well as other social services that have analyzed. In charts like these, we typically expect to see the probability of repeat posts shoot up to well north of 90% by the time the user has made several posts. This is basically the “once you’re using it you’re hooked” principle.

With Google Plus, however, this number never crosses the 90% mark. Even after having made five such posts, the chance of making a sixth is only 85%. The means that 15% of people who have made five posts never came back to make a sixth.

Cohort Analysis

The cohort analysis below shows the rate at which new publicly-viewable posts are created by users who made their first post in different months throughout time.

This is a cumulative chart, so we’re basically showing the “average number of total posts made” as it grows over time for users in each cohort.

The decay rate here is very concerning. Users are less and less likely to make additional posts even a few months after initially joining. While it may not be an apples-to-apples comparison, it’s interesting to contrast this with the same chart from our Pinterest Data Analysis, which shows no decay whatsoever.

Time Between Posts

We were surprised at the by the length of time between public posts among users. On average, a user waits 15 days between making their first public post and making their second. This number declines with each subsequent post, but not drastically. There is an average of 10 days between a user’s fifth and sixth public posts.

The overall average time between any two public posts by the same user is 12 days.

Remember that, since we are only looking at public posts, it is very possible that users are making non-public posts in between the ones that we were able to see. Despite this, however, we were still quite surprised by the large amount of time between public posts.

+1s, Replies, and Sharing

Of all the categories, we feel that this is the least likely to be biased by the fact that we only studied public posts. These public posts will still be visible to each member’s private networks, and actually could attract +1s, shares, and replies from external users as well. If anything, we would expect our numbers here to be higher than in the general population.

Despite that, our population of nearly 70,000 posts yielded the following properties:

  • An average of 0.77 “+1s” per post
  • An average of 0.54 replies per post
  • An average of 0.17 re-shares per post

Conclusion

From what we can see from the outside looking in, Google Plus has a long way to go before it becomes a real threat to the social networking landscape. While user growth is strong, it is unclear how much of that is driven by tie-ins with other Google products.

At the end of the day, Google Plus simply does not show the same level of ravenous user adoption and engagement that we’ve seen in other social networks (see our reports on Pinterest Data and Twitter Data for examples).

Advanced Google Analytics with RJMetrics

Starting today, RJMetrics clients can access their Google Analytics data via the RJMetrics web-based dashboard. This powerful system helps reinforce our core goal of creating a robust, affordable business intelligence solution for web-based businesses of all sizes.

Below are some of the key features associated with this new enhancement.

Easy and Secure Setup

RJMetrics uses the highly secure and widely accepted OAuth standard for authenticating your Google Analytics account. We never require your Google password and all information is sent to and from Google using SSL.

Advanced Data Exploration

Along with basic Google Analytics metrics such as visitors and pageviews, the RJMetrics Chart Wizard allows you to perform advanced segmentations across all metrics and dimensions available in Google Analytics. This currently includes a universe of 51 metrics and 57 dimensions.

Depending on the Google Analytics features your company uses, this can include data on Advertising Campaigns, E-Commerce Tracking, Internal Search, and more.

Automatic Chart and Dashboard Creation

When you first configure Google Analytics, our system will allow you to automatically add pre-populated Google Analytics dashboards to all of your users’ accounts. When applicable, these dashboards include information on web traffic, advertising campaigns, and goal conversions.

These few clicks allow you to generate fully-populated dashboards that are rich with data from Google Analytics:

Composite Chart Compatibility

All charts based on Google Analytics data are fully compatible with our Composite Chart builder, allowing users to build charts that combine traffic information with sales and behavioral data from their backend database. This makes metrics such as “revenue per unique visitor” or “conversion to first time purchasers” attainable with just a few clicks.

How to Configure

To configure Google Analytics, your RJMetrics account must be an administrator for your business. Simply log into your RJMetrics Dashboard and choose “Data Sources” from the Settings Page.

From there, you will be able to add a new Google Analytics connection and will be redirected to a Google webpage where you can grant RJMetrics data access. You’ll then be redirected to RJMetrics, where you can choose which Analytics accounts to add and which dashboards to generate.

Easy and Secure Setup

RJMetrics uses the highly secure and widely accepted OAuth standard for authenticating your Google Analytics account. We never require your Google password and all information is sent to and from Google using SSL.

Advanced Data Exploration

Along with basic Google Analytics metrics such as visitors and pageviews, the RJMetrics Chart Wizard allows you to perform advanced segmentations across all metrics and dimensions available in Google Analytics. This currently includes a universe of 51 metrics and 57 dimensions.

Depending on the Google Analytics features your company uses, this can include data on Advertising Campaigns, E-Commerce Tracking, Internal Search, and more.

Automatic Chart and Dashboard Creation

When you first configure Google Analytics, our system will allow you to automatically add pre-populated Google Analytics dashboards to all of your users’ accounts. When applicable, these dashboards include information on web traffic, advertising campaigns, and goal conversions.

These few clicks allow you to generate fully-populated dashboards that are rich with data from Google Analytics:

Composite Chart Compatibility

All charts based on Google Analytics data are fully compatible with our Composite Chart builder, allowing users to build charts that combine traffic information with sales and behavioral data from their backend database. This makes metrics such as “revenue per unique visitor” or “conversion to first time purchasers” attainable with just a few clicks.

How to Configure

To configure Google Analytics, your RJMetrics account must be an administrator for your business. Simply log into your RJMetrics Dashboard and choose “Data Sources” from the Settings Page.

Media_httpthemetricsy_zjkgn

From there, you will be able to add a new Google Analytics connection and will be redirected to a Google webpage where you can grant RJMetrics data access. You’ll then be redirected to RJMetrics, where you can choose which Analytics accounts to add and which dashboards to generate.

Google Analytics API Snags: Malformed Request, The Site Has Not Been Registered

[Follow our blog posts, obsession with data, and original articles on Twitter @RJMetrics]

With the recent public release, we’re excitedly working on an easy interface that will allow our customers to view their Google Analytics data from their RJMetrics business intelligence dashboards. Jake’s been having a fun time putting this together, and today we tag-teamed a bizarre bug that we thought we’d share.

As a quick background, anyone who wants to pull data from Google applications (contacts, calendar, web statistics, etc) needs to first be granted permission to do so by the user who owns or has legitimate access to that data. Google originally built its own “Google Authentication Service” to accomplish this, and has also recently adopted the open API authorization protocol OAuth.

Using the Google Authentication Service is very easy– in fact, a simple script like the one found can be uploaded to test it out without any modification. In essence, here’s what happens:

  • As a webmaster who wants to use a visitor’s data, you build a custom link that sends your visitor to a special Google page for permission authorization. You also send a variable that tells them where to redirect the user if they say it’s OK.
  • The Google page recognizes the domain name of the redirect URL you sent and asks if the user is comfortable sharing his or her information with the website at that domain.
  • Assuming they say yes, the users is redirected to the URL you specified, along with a token that you can then use to request their data via the Google API.

Optionally, you can make the permission authorization page less scary-looking (fewer warnings, etc) by “registering” your website with Google. This is a simple that basically associates a given URL with a Google Account.

Simple, right? We followed Google’s instructions and built the Google authorization URL exactly to specifications. However, when it was followed (from any machine) we received the following error:

The page you have requested cannot be displayed. Another site was requesting access to your Google Account, but sent a malformed request. Please contact the site that you were trying to use when you received this message to inform them of the error. A detailed error message follows:

The site “http://rjmetrics.com” has not been registered.

So here’s the strange part: site registration is optional, so the fact that we weren’t registered shouldn’t have mattered. In fact, if we changed the “next” variable in our custom Google URL (that’s the variable specifying where to send the user after approval) to any other domain (even one that didn’t exist), the process worked fine and the user could “grant access” and be sent back to whatever URL we had chosen… as long as it wasn’t rjmetrics.com.

Just to recap, we had an error message saying our domain “has not been registered,” but evidence that you don’t need to register your domain with Google in order for the page that generated the error to work. And, we couldn’t replicate this error on any of the dozens of other non-RJMetrics domains we tried. What made our domain so special? One thought was that we have always used Google Apps for our e-mail, calendars, etc, so our domain is already well known to Google (and at some point in time our ownership of the domain had been verified in some way).

While the error message was quite intriguing and somewhat nonsensical, our best chance at an easy fix was quite obvious: register our domain with Google. When I got to the domain registration page, things looked extremely familiar. I had done this before– way back when we first set up Google Apps for our domain I completed the steps on this exact same page. The only difference now was that, after registration was verified, we were given an OAuth Consumer Key and Token to use with that domain.

And that did it. As soon as we registered our site, the error message disappeared and the authorization process worked flawlessly.

OK, so the story is anticlimactic, but what’s really interesting is the error message we saw and the fact that our domain seemed exclusively exposed to this issue.

So, here’s what we think happened: At some point in time back before OAuth was being used by Google and possibly when the Google APIs were still young, we went through a “domain verification” process in order to use Google Apps with our domain. This created some kind of record on Google’s servers associated with the ownership of our domain name. However, when Google started using that same verification process to register domains for API usage (and slightly modified it to include OAuth), our legacy record stuck around but different or missing data associated with it.

Fast forward to today. We built the custom Google authentication URL that included our domain name as the “next” variable. Google found a record associated with our domain saying that it was registered in some way (thus not generating the page for un-registered URLs). However, our registration didn’t contain all of the information associated with API-related registrations. Since we didn’t qualify in either category, the result was the bizarre “not registered” warning (now especially bizarre since we actually were registered in some way).

When we went through the registration page this time, whatever old record had existed was overwritten and everything was repaired.

Anyway, if you get this error the quick fix is to simply register your domain with Google. Yes, we could have said that in a few hundred fewer words (heck, we could have tweeted it). What we enjoyed, however, was speculating about the origins of this strange behavior. We hope you enjoyed it too!

RJMetrics vs. Google Analytics

“How is RJMetrics different from Google Analytics?” It’s a great question, and it’s one we hear frequently. Both tools can be extremely valuable, but the truth is that RJMetrics and Google Analytics live and work in separate worlds. We analyze completely different data for very different purposes.

It’s easy to see why some view us as similar at first glance. Both Google Analytics and RJMetrics provide users with a rich and robust dashboard interface; both services provide users with a hosted portal that is accessible via the web 24/7; and both advocate the value of using data analysis to stay informed and influence decision-making.

However, the similarities largely stop there. RJMetrics and Google Analytics are actually so different that we view Google Analytics as a complement rather than as a competitor. (In fact, we use Google Analytics to monitor traffic on our own website.) These differences stem from a stark contrast in the type of data we use and the analysis it allows us to provide.

Data Differences

Google Analytics collects data using JavaScript that observes users as they traverse the pages of your website. Over time, it accumulates a data set about where your visitors come from, what pages they visit, and, depending on which features you implement, how they interact with your site.

However, any fast-growing business will quickly amass a large and meaningful reservoir of additional information that Google Analytics simply can’t access. This is the backend database, which not only powers the content and interactivity of your company’s website, but also serves as a repository for business data. Such data can include everything from user-specific information (registrations, purchases, logins, demographics, etc) to company operational data (product catalogs, content development, inventory, etc). Most importantly, it includes information about how these data points interact with each other throughout the life-cycle of your business.

RJMetrics therefore uses the only data set you can ever truly rely on for business intelligence: your own. By definition, your backend database is designed to capture the actions and activities that are vital to the operation of your business. No outside provider, including Google Analytics, could ever accumulate this degree of detail and value from the outside looking in.

In short, the data in Google Analytics is compiled from what happens on the surface of your website; RJMetrics penetrates this surface and uses the raw business data that your site itself compiles behind the scenes.

Core Analysis Differences

The most valuable component of the Google Analytics service is their core offering: web traffic metrics (we will discuss advanced features in a moment). This is basically any information about your visitors that can be collected using JavaScript, and it can lead to some interesting charts on things like:

  • Number of page views and unique visitors
  • Time spent on-site
  • Top referrers
  • Geographic data (approximated from visitor IPs)
  • Browser and operating system type

Depending on the data you store in your database, RJMetrics can also provide information like this, although it’s not our core competency. We are far more focused on the actual transactions and interactions that happen as a part of your business model, whether they are sales, registrations, utilization, or otherwise participatory in nature. This allows us to look at things like:

  • Overall business growth and acceleration by key business-defined metrics (number of users, revenue, utilization, logins, etc)
  • Activity characteristics (purchase size, activity duration, etc)
  • Loyalty characteristics (repeat activity and related probabilities)
  • Feature or product popularity (by age, category, vendor, etc)
  • Churn rates
  • User lifetime values
  • Product inventory status and sellout rates (sale velocity)
  • Numerous other metrics depending on your database and business model

Most importantly, however, all of these metrics can then be segmented into customer/user sub-groups, or “cohorts” based on things like when they joined, who referred them, self-reported inferred demographic characteristics, behavioral history, and other behaviors or tendencies specifically relevant to your business.

The amount of valuable data yielded by these permutations is staggering. From them, RJMetrics allows you to identify the trends and segments that matter, and monitor them just as easily as you can watch your page views grow with Google Analytics.

Advanced Google Features

It’s important to point out that Google Analytics has come out with some extremely cool new features recently, many of which aim to address the “surface-level data” shortcoming. For example, “Google Analytics Ecommerce Tracking” now allows e-commerce sites to send Google Analytics some basic information about every purchase that happens on their site. Google then ties that data to the “visitor” they were observing when the purchase was made and allows you to slice sales data by the visitor characteristics it collects (browser type, geographic info, etc).

This is an exciting and interesting addition to the Google Analytics data set, but the irony is that it works by creating a redundant sales database that is completely disconnected from your other valuable business data. There are a few reasons this isn’t ideal:

  • The data is collected on a going-forward basis, so any historical information about customer purchases or sales activity are omitted from the data set.
  • By separating basic sales data from your other business data, it becomes impossible to reliably examine metrics like repeat purchasing behavior or the correlation between customer characteristics and purchasing tendencies.
  • Google Analytics knows what product name or number was purchased, but nothing else about that product. How long has it been available? Was it on sale? What category or brand of merchandise does it belong to? How does that category or brand compare to others? All of these characteristics are integral to advanced analysis and decision making.
  • Making remote calls via JavaScript is not the optimal way to collect this data. It is subject to browser quirks and relies on on-the-fly communication with a remote server in order to work. This creates a much higher risk of missing transactions or double-counting them. Not to mention the inability to net out things like returns. Again, to drill down beyond the surface, you really need to capture the data from the server (not the pages), along with the relevant business context.

Ultimately, we view it like this: Google Analytics monitors web traffic from the outside looking in. RJMetrics provides business intelligence from deep beneath the surface. Both are great tools that can help a smart businessperson make decisions. However, if you’re interested in real business intelligence informed by your own business’s raw data, you’ll need to dig deeper. And you can only do it with a business intelligence tool like RJMetrics.

Visit our website to learn more about RJMetrics business intelligence solutions or request information.

Google Analytics Upgrade

Google Analytics announced a bunch of new features that allow you to view and use your traffic data in new ways. They made the announcement yesterday at the eMetrics Summit in DC.

We are particularly focused on the API. It is still in private beta, but we’re hopeful we can get access soon as we’ve been jonesing for this for some time. We’ve seen workarounds that involve emailing xml summaries out of google analytics daily, but this should be a huge improvement.

They’ve also incorporated the Trendalyzer chart that they acquired from Gapminder and renamed it motion charts.

This is a great, free tool that we use here at RJMetrics, and it keeps getting better.