Here at RJMetrics, we help online businesses to make smarter decisions using their data. Time and time again, we see customers gaining valuable new insights from cohort analysis. We also recognize that just about everyone on the web is using Google Analytics.

So, wouldn’t it be awesome if you could conduct a cohort analysis in Google Analytics? We thought so.

This article outlines the best way to enable analyzing custom cohorts of all sizes in Google Analytics by only using up a single custom variable slot.

The Original Hack

A quick search revealed some prior art, such as Dan Hill’s great article on Hacking a Cohort Analysis with Google Analytics. Dan’s method works great: use two of the five custom variables in Google Analytics to store the month and year of that user’s cohort. Then you can build custom filters to only look at cohorts for a specific year or month.

As Dan shows, tagging your users with these details is simple. Just push two extra lines of information to Google Analytics as part of the standard javascript call tracking their pageview.

//

This works well, but we’ve learned that sometimes a monthly or yearly cohort analysis just isn’t enough. In RJMetrics, we allow our customers to conduct cohort analyses at daily, weekly, monthly, quarterly, and yearly levels. Depending on the data set, any or all of these can prove extremely valuable. We set out to enhance Dan’s hack so that Google Analytics users could have same level of detailed analysis.

The Enhancement

With only five custom variable slots available, storing enough information to place any given user in up to six cohorts (year/quarter/month/week/day/hour) seemed unrealistic. That is, of course, until we realized two critical facts:

  • Any single custom variable can store up to 128 characters
  • Google Analytics allows you to create filters on these fields using regular expressions

In other words, if we could represent all of the necessary cohort data in one long string that followed a predictable pattern, we could later use regular expressions to isolate specific cohorts based on the contents of a single custom variable.

Below, we outline a simple syntax for building this “cohort identifier string.” We decided on this syntax because it will fit in the 128-character limit of the custom variable and is still human-readable. We will be storing the following information in this order:

Y: Year (4 characters)
Q: Quarter (1 character, 1 – 4)
M: Month (2 characters, 01-12)
WY: Week Year (4 characters)
WM: Week Month (2 characters, 01-12)
WD: Week Day (2 characters, 01-31)
D: Day (2 characters, 01-31)
H: Hour (2 characters, 00-23)

Here is an example of the custom cohort variable string for a customer in the 3pm August 23rd, 2012 cohort:

Y:2012;Q:3;M:08;WY:2012;WM:08;WD:19;D:23;H:15

As we’ll show in a minute, building a regular expression to match any cohort you’d like from a string like this is extremely simple.

(A note on weekly cohorts: the day, month, and year used to represent a week are separate from the day, month, and year used to represent… well… days, months, and years. This is because we define a week based on the calendar date of its first day (traditionally a Sunday, but you could adjust your code to use any weekday you’d like). Since the Sunday of a given week could exist in a different month or year than other days of that week, we can’t rely on the month or year associated with the other cohorts to be the same for the week.)

We build the cohort string on the backend using PHP, although this would be simple enough to implement in any language of your liking.

//

The Analysis

As before, we assign the custom variable in the Google Analytics javascript tag by adding one simple line (note that cohort string is coming from your backend or templating system):

//

From here, doing the cohort analysis in Google Analytics is a piece of cake. Just click on “advanced segments”, select “custom segments,” and click “new custom segment.”

Create a title for the cohort you wish to create, for example ‘Week of 07/29/2012 Cohort.” In order to get these customers, we should select our custom cohort variable from the dropdown menu, and use the RegExp match option.

 

Here is the general purpose regular expression. In order to filter the customers, just add the desired cohort information within the parentheses after the appropriate colon.

(Y:).*(Q:).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

For example, to view the 2012 cohort, you’d simply add a “2012″ after the “Y:”:

(Y:2012).*(Q:).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

To view the Q1 2012 cohort, you’d also add a “1″ after the “Q:”:

(Y:2012).*(Q:1).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

To view the cohort that joined on February 14, 2012 you’d use the following:

(Y:2012).*(Q:).*(M:02).*(WY:).*(WM:).*(WD:).*(D:14).*(H:)

To find the cohort for the Week of 07/29/2012, we would use the following:

(Y:).*(Q:).*(M:).*(WY:2012).*(WM:07).*(WD:29).*(D:).*(H:)

It’s that easy!

Special Considerations

When choosing what date to use for bucketing a given visitor into a cohort, the choice is up to you. If you’re montioring web traffic and want to see the rates at which people come back, you can simply use the timestamp of their first visit. If you’re looking to conduct cohort analysis on the actions of registered users, you could assign that user’s registration date. Or, if you’re not sure, you could use a second custom variable slot and do both.

The key, however, is to start now. As with all things Google Analytics, custom tracking will only work for users on a going-forward basis. If you’d like to run cohort analyses that go back throughout the history of your entire business, maybe you should give RJMetrics a spin.

  • Pavel

    Guys!
    Nice article about cohorts in GA, I’m going to realized that. Just one remark.
    According to GA documentation, you should call _trackPageview() after _setCustomVar().