Hacker News Data Analysis: 2014 Edition

Last week, I noticed that my Hacker News account had turned five years old. Wow. This got me wondering about the HN community in general and raised a concerning question: am I a typical user… or the creepy old guy in the room?

A few years back, I crunched some numbers from the HNSearch API to learn more about the Hacker News community. I figured this data set was overdue for a revisit. I pulled a fresh sweep of data from the API, loaded it into RJMetrics, and had my answers in just a few minutes.

(Obligatory plug: If you’re interested in a powerful analysis tool for your online business, try RJMetrics for free today.)

Here’s what I found:

  • Aaron Swartz and the NSA were the big topics in 2013, taking the place of 2012 leaders like SOPA and Hurricane Sandy
  • AngularJS gained serious momentum in 2013 as mentions of Backbone.js cut in half
  • With over 5 million upvotes cast in 2013, Hacker News activity grew by about 25% for a record year
  • The balance of comments, submissions, and upvotes has stayed remarkably stable as the community has grown
  • Users who engage early on stick around for years to come– in fact, more than half of the site’s activity in 2013 was from users who had joined more than a year before

For the full details behind these and other findings, read on.

Continue reading

The Best Method for Cohort Analysis in Google Analytics

Here at RJMetrics, we help online businesses to make smarter decisions using their data. Time and time again, we see customers gaining valuable new insights from cohort analysis. We also recognize that just about everyone on the web is using Google Analytics.

So, wouldn’t it be awesome if you could conduct a cohort analysis in Google Analytics? We thought so.

This article outlines the best way to enable analyzing custom cohorts of all sizes in Google Analytics by only using up a single custom variable slot.

The Original Hack

A quick search revealed some prior art, such as Dan Hill’s great article on Hacking a Cohort Analysis with Google Analytics. Dan’s method works great: use two of the five custom variables in Google Analytics to store the month and year of that user’s cohort. Then you can build custom filters to only look at cohorts for a specific year or month.

As Dan shows, tagging your users with these details is simple. Just push two extra lines of information to Google Analytics as part of the standard javascript call tracking their pageview.

//

This works well, but we’ve learned that sometimes a monthly or yearly cohort analysis just isn’t enough. In RJMetrics, we allow our customers to conduct cohort analyses at daily, weekly, monthly, quarterly, and yearly levels. Depending on the data set, any or all of these can prove extremely valuable. We set out to enhance Dan’s hack so that Google Analytics users could have same level of detailed analysis.

The Enhancement

With only five custom variable slots available, storing enough information to place any given user in up to six cohorts (year/quarter/month/week/day/hour) seemed unrealistic. That is, of course, until we realized two critical facts:

  • Any single custom variable can store up to 128 characters
  • Google Analytics allows you to create filters on these fields using regular expressions

In other words, if we could represent all of the necessary cohort data in one long string that followed a predictable pattern, we could later use regular expressions to isolate specific cohorts based on the contents of a single custom variable.

Below, we outline a simple syntax for building this “cohort identifier string.” We decided on this syntax because it will fit in the 128-character limit of the custom variable and is still human-readable. We will be storing the following information in this order:

Y: Year (4 characters)
Q: Quarter (1 character, 1 – 4)
M: Month (2 characters, 01-12)
WY: Week Year (4 characters)
WM: Week Month (2 characters, 01-12)
WD: Week Day (2 characters, 01-31)
D: Day (2 characters, 01-31)
H: Hour (2 characters, 00-23)

Here is an example of the custom cohort variable string for a customer in the 3pm August 23rd, 2012 cohort:

Y:2012;Q:3;M:08;WY:2012;WM:08;WD:19;D:23;H:15

As we’ll show in a minute, building a regular expression to match any cohort you’d like from a string like this is extremely simple.

(A note on weekly cohorts: the day, month, and year used to represent a week are separate from the day, month, and year used to represent… well… days, months, and years. This is because we define a week based on the calendar date of its first day (traditionally a Sunday, but you could adjust your code to use any weekday you’d like). Since the Sunday of a given week could exist in a different month or year than other days of that week, we can’t rely on the month or year associated with the other cohorts to be the same for the week.)

We build the cohort string on the backend using PHP, although this would be simple enough to implement in any language of your liking.

//

The Analysis

As before, we assign the custom variable in the Google Analytics javascript tag by adding one simple line (note that cohort string is coming from your backend or templating system):

//

From here, doing the cohort analysis in Google Analytics is a piece of cake. Just click on “advanced segments”, select “custom segments,” and click “new custom segment.”

Create a title for the cohort you wish to create, for example ‘Week of 07/29/2012 Cohort.” In order to get these customers, we should select our custom cohort variable from the dropdown menu, and use the RegExp match option.

 

Here is the general purpose regular expression. In order to filter the customers, just add the desired cohort information within the parentheses after the appropriate colon.

(Y:).*(Q:).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

For example, to view the 2012 cohort, you’d simply add a “2012″ after the “Y:”:

(Y:2012).*(Q:).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

To view the Q1 2012 cohort, you’d also add a “1″ after the “Q:”:

(Y:2012).*(Q:1).*(M:).*(WY:).*(WM:).*(WD:).*(D:).*(H:)

To view the cohort that joined on February 14, 2012 you’d use the following:

(Y:2012).*(Q:).*(M:02).*(WY:).*(WM:).*(WD:).*(D:14).*(H:)

To find the cohort for the Week of 07/29/2012, we would use the following:

(Y:).*(Q:).*(M:).*(WY:2012).*(WM:07).*(WD:29).*(D:).*(H:)

It’s that easy!

Special Considerations

When choosing what date to use for bucketing a given visitor into a cohort, the choice is up to you. If you’re montioring web traffic and want to see the rates at which people come back, you can simply use the timestamp of their first visit. If you’re looking to conduct cohort analysis on the actions of registered users, you could assign that user’s registration date. Or, if you’re not sure, you could use a second custom variable slot and do both.

The key, however, is to start now. As with all things Google Analytics, custom tracking will only work for users on a going-forward basis. If you’d like to run cohort analyses that go back throughout the history of your entire business, maybe you should give RJMetrics a spin.

New Google Plus Data Shows Weak User Engagement

Google CEO Larry Page recently announced that Google Plus crossed over the 100 million user mark and continues to see strong user growth.

Despite these strong numbers, however, the service continues to be pummeled in the press. Many outlets have claimed that engagement is poor and that growth is only fueled by Google forcing membership upon users of its other products.

Rather than rely on third-party reports, we decided to pull publicly available data on a random population into an RJMetrics online dashboard and see for ourselves.

Here are some of our most interesting findings:

  • The average post has less than one +1, less than one reply, and less than one re-share.
  • 30% of users who make a public post never make a second one. Even after making five public posts, there is a 15% chance that a user will not post publicly again.
  • Among users who make publicly-viewable posts, there is an average of 12 days between each post
  • A cohort analysis reveals that, after a member makes a public post, the average number of public posts they make in each subsequent month declines steadily. This trend is not improving in newer cohorts.

How We Did It

We began by selecting a population of 40,000 random Google Plus users. For each user, we downloaded their entire public timelines (which consist of all publicly-visible activities for that user). Only one third of the users in our population had any public activity, so this sub-set of the population is the main focus of many of our statistics.

Once we had the data, it was a snap to upload it to RJMetrics and pull the insights seen here with just a few clicks.

Since we are looking at public data exclusively, we want to point out that this data is not necessarily reflective of the entire population of users. These are simply insights into the public-facing actions of Google Plus users based on a population that is known to post publicly.

Repeat Posters

Once a user has made one public post, the chances that they will make a second post are quite strong: around 70%. After that, however, Google Plus does not perform as well as other social services that have analyzed. In charts like these, we typically expect to see the probability of repeat posts shoot up to well north of 90% by the time the user has made several posts. This is basically the “once you’re using it you’re hooked” principle.

With Google Plus, however, this number never crosses the 90% mark. Even after having made five such posts, the chance of making a sixth is only 85%. The means that 15% of people who have made five posts never came back to make a sixth.

Cohort Analysis

The cohort analysis below shows the rate at which new publicly-viewable posts are created by users who made their first post in different months throughout time.

This is a cumulative chart, so we’re basically showing the “average number of total posts made” as it grows over time for users in each cohort.

The decay rate here is very concerning. Users are less and less likely to make additional posts even a few months after initially joining. While it may not be an apples-to-apples comparison, it’s interesting to contrast this with the same chart from our Pinterest Data Analysis, which shows no decay whatsoever.

Time Between Posts

We were surprised at the by the length of time between public posts among users. On average, a user waits 15 days between making their first public post and making their second. This number declines with each subsequent post, but not drastically. There is an average of 10 days between a user’s fifth and sixth public posts.

The overall average time between any two public posts by the same user is 12 days.

Remember that, since we are only looking at public posts, it is very possible that users are making non-public posts in between the ones that we were able to see. Despite this, however, we were still quite surprised by the large amount of time between public posts.

+1s, Replies, and Sharing

Of all the categories, we feel that this is the least likely to be biased by the fact that we only studied public posts. These public posts will still be visible to each member’s private networks, and actually could attract +1s, shares, and replies from external users as well. If anything, we would expect our numbers here to be higher than in the general population.

Despite that, our population of nearly 70,000 posts yielded the following properties:

  • An average of 0.77 “+1s” per post
  • An average of 0.54 replies per post
  • An average of 0.17 re-shares per post

Conclusion

From what we can see from the outside looking in, Google Plus has a long way to go before it becomes a real threat to the social networking landscape. While user growth is strong, it is unclear how much of that is driven by tie-ins with other Google products.

At the end of the day, Google Plus simply does not show the same level of ravenous user adoption and engagement that we’ve seen in other social networks (see our reports on Pinterest Data and Twitter Data for examples).

Airbnb Data Analysis: 6 Million Users by Year-End, Only 20% Active

Airbnb is one of the hottest sites on the internet. The Y Combinator graduate has raised $120 Million of funding to change the way people find places to stay around the globe.

As fans of Airbnb with a passion for startup data, we decided to try and learn more about the site’s user base by looking at the publicly-available profiles of its members. We sampled just over 60,000 users and were able to draw some interesting insights using an RJMetrics business intelligence dashboard.

Some highlights include:

  • Airbnb has over 2.1 million registered users and is growing about 250% year-over-year. At this rate, they’ll have 3 million users by the end of June and 4 million by the end of August.
  • Almost 85% of Airbnb’s userbase has never received a review as a host or a guest. Our sample suggests that there may be as few as 350,000 reviewed users among the userbase of over 2 million.
  • Usage is addictive — with each additional stay booked through Airbnb, users become increasingly likely to book again.

User Growth

Since Airbnb uses auto-incrementing IDs for its users and does not appear to have skipped any range of ID values, it is quite easy to track user growth over time.

Airbnb has very seasonal growth patterns, with most new users signing up in the summer (peaking in August) and significantly fewer signing up in the winter months (reaching a low point in December). These user analytics were easily extracted using RJMetrics.

The current user count is approximately 2.1 million.

For the past several months, the year-over-year growth rate has been steady at around 250%. Extrapolating this out for the rest of the year puts the site’s user count at over 3 million by the end of June and over 4 million by the end of August. At its current growth rate, the site will approach 6 million registered users by the end of 2012.

Usage

Since we didn’t have direct access to data on actual stays, we used reviews as a proxy for activity. Reviews are the lifeblood of the Airbnb community, so we think it’s fair to assume that the number of reviews is a good proxy for the number of stays.

Most sites we study show signs of the “80/20 rule,” which suggests that 80% of the activity comes from 20% of the users. In Airbnb’s case, it’s more like the “100/20″ rule — only 16% of the user base has been reviewed as a host or a guest.

Here are some other usage statistics:

Only about 14% of users (or about 300,000 users) have been reviewed as guests.

Only about 2.3% of users (or about 50,000 users) have been reviewed as hosts.

A mere 0.5% of the userbase has been reviewed as both guest and host.

5% of users (or about 100,000 users) have active listings, but only 2% (or about 40,000 users) have received reviews from guests. This suggests that more than half of the people listing properties have yet to host a guest.

Repeat Activity

By relying on the same techniques we use to track repeat purchase probability in RJMetrics, we were able to profile the average user’s likelihood of using Airbnb with each additional stay.

As you can see, while only about 14% of the userbase ever books a stay (as indicated by a first review from a host), 22% of those users who book once go on book a second stay via Airbnb. By the time a user has booked five stays, the likelihood that they will book another stay on Airbnb is over 50%.

Note that these percentages are based on the behavior of the existing user population, the majority of which has been registered for less than a year. Since so many users have a limited history on the site, it’s quite likely that these numbers will increase over time.

Conclusion

Airbnb continues to explode in popularity and experience tremendous user growth as a result. As with most consumer sites, however, the population of active users is much smaller than the total registered user count.

Airbnb’s key to continued success will be to both grow its user base and convert more of its registered users into paying customers. As we’ve seen, with each additional booking users become more likely to book again.

Click here to use RJMetrics to draw actionable insights from your company’s data.

Data in the Year 2050

In the past decades, business intelligence focused on transforming large unstructured datasets into usable metrics for decision making. Out of this data revolution came Just in Time (JIT) manufacturing and supply chains. Although past data still cannot predict the future perfectly, statistical forecasts and regression models have become widely trusted. With the vast amount of data generated online, insights on people’s intentions, opinions and wishes are also increasingly precise.

So what does the future hold for data analytics in the year 2050? Here are my speculations:

Amusement Stores

Bored of shopping online, you will want to head out to a premium brick-and-mortar store with friends, where an entrance fee is duly charged. Here you’ll find a selection of physical products on display, not holograms, which you can try before buying. But this store is not antique, it knew what you were looking for before you stepped in. Having access to your historical search queries and browsing preferences, both online and in real life, the store changes pricing of products dynamically over time depending on people’s interest. 3D cameras around the store monitor your body language, while microphones record your comments. Being one of the only places left for marketers to evaluate customers in-person, all your actions are meticulously analyzed. Whether you fit into the target market of a product is immediately known to the manufacturer. If the clothes you’re trying on do not fit, a robot will manufacture a tailored version on the spot based on your body size, as recorded by the store cameras. S, M and L tags are things that your grandparents reminisce about. The same goes for any physical object you use. 3D printing, combined with data on your body shape and historical preferences, ensure that every item is a perfect fit. Of course, you have to pay first and probably won’t be able to sell any used items, but that’s the point.

Everything is related

When you’re done at the store, head back outdoors with your smartshades (lenses that dynamically adapt to your vision and the surrounding light intensity, but that also have a Head-Up Display layered for its nano-PC). They’ll help. See your car? You can also see its latest market value, days till an oil change is needed directly from the engine sensor, and a reminder to pickup your kids from piano lessons at 4:30pm. Smartshades help you analyze and live efficiently. Databases are now globally connected and semantic search is king. On the road home, your smartshades will warn you to be careful around drivers with a bad accident history. If you see something interesting, snap pictures and share them with three blinks of an eye. Of course, chatting on your smartshades is not legal as you’re driving, but you can definitely make a stop and say hi to friends you saw walking nearby, on your real-time satellite map. Driving by a grocery store? Pay attention to the top right corner of your shades: it’s telling you what’s about to expire in the fridge (but why bother when you’ve already got a JIT grocery subscription), along with diner suggestions based on available ingredients. As you arrive home, your shades will pull up your favorite hologram shows. If you’re wondering what your loved one is doing, simply switch views to source live video from his or her smartshades. You are now connected.

Forget slice and dice, start chopping and spreading

For people analyzing data, a world of charts and graphs is found through your pair of work smartshades. Here, you can look at metrics from around the globe by spinning a 3D virtual Earth, captured from satellites in orbit. You can manipulate charts, documents, and overlays by hand motion, interpreted by the smartshades’ camera and nano-PC. Zoom in and you will find local data sets on your products. Then choose to view the latest stats from sales, social networks or internal management, always paired with live aerial video for your location of interest in the background overlay. Zoom further and you will see real-time 3D videos from buildings, stores and street cameras for an idea of what customers are doing with your products. Feel free to navigate the environment like a bird. You can even highlight some objects worthy of attention in yellow while the rest of the world apppears in gray. It’s a convenient feature. Now tap on a client or select a group of clients to review their feelings and psychographics. You can definitely tell whether they are happy, disappointed or confused, but they might also be annoyed that you are peeking at them.

Dream Away!

Are we at RJMetrics building this future? Only the stuff you like. Send us your ideas!

New Pinterest Data: What’s Everyone Pinning About?

Last month, we released an in-depth report on Pinterest user behavior. The general theme among our many findings was that Pinterest users are deeply engaged and remain highly active over time.

This led many people to ask a logical follow-up question: what exactly is everyone pinning about? With about a million pins still sitting in an RJMetrics hosted data warehouse from our previous report, we decided to answer that question.

Our full report is below, but here are some of our key findings:

  • The all-time most popular pinboard categories are Home (17.2%), Arts and Crafts (12.4%), Style/Fashion (11.7%), and Food (10.5%).
  • Good news for e-commerce players looking to cash in on Pinterest: Products I Love is the third most popular pinboard name, and pins that exist on pinboards about Products are the most likely to be liked by other users.
  • Food is the fastest-growing pinboard category. Food is also the most likely category to be repinned, on average generating over 50% more re-pins than the next most repinned category, Style and Fashion.

How We Did It

We sampled just under one million pins from pinterest.com by browsing Pin ID Numbers and usernames from the general population. These pins were from about 9,200 unique users and were stored across pinboards with about 15,000 unique names.

We mapped pinboard names to content categories by identifying the most commonly used words and phrases and mapping those to specific content categories. This allowed us to categorize the vast majority of the pinboards we identified into a handful of distinct categories.

As always, RJMetrics was our secret sauce. We conducted the analysis for this article with just a few clicks from our RJMetrics online dashboards.

Most Popular Board Names

Most pinboard names are used by many different users throughout Pinterest. In fact, some board names represent as much as 3% of the total pinboard population. The 10 most common board names, along with the percent of boards they represent, are listed below:

  • For the Home (3.15%)
  • My Style (1.97%)
  • Products I Love (1.86%)
  • Books Worth Reading (1.68%)
  • Food (1.23%)
  • Favorite Places & Spaces (1.00%)
  • Recipes (0.75%)
  • Craft Ideas (0.74%)
  • Christmas (0.72%)
  • Crafts (0.65%)

Those who are excited about Pinterest’s use as a product recommendation engine will be happy to see Products I Love as the third most popular board name, representing about 1.9% of pinboards.

Most Popular Board Categories

In addition to the popular names listed above, there are thousands upon thousands of additional unique board names in our sample. By isolating key words and phrases, we were able to bucket about 85% of the pinboards from our sample into categories.

The top 10 most popular categories are listed below:

  • Home (17.2%)
  • Arts and Crafts (12.4%)
  • Style/Fashion (11.7%)
  • Food (10.5%)
  • Inspiration/Education (9.0%)
  • Holidays/Seasonal (3.9%)
  • Humor (2.1%)
  • Products (2.1%)
  • Travel (1.9%)
  • Kids (1.8%)

Note that over 60% of pinboards fall into the top 5 categories.

Emerging Trends: Food Trumps Fashion

If we look at these top categories over time, by and large it appears as though their relative percentages are holding steady.

As you might expect, the only line showing some degree of seasonality is the one related to holidays and seasonal content.

As a next step, however, we isolated the two categories that had shown the biggest changes in market share over the past 12 months and uncovered something interesting:

In the early days of Pinterest, Style and Fashion represented twice as many pinboards as Food. However, in recent months, Food has gained more ground than any other category, and has actually become more popular than Style and Fashion among new pinboards created.

This could be a reflection of Pinterest’s increasing popularity in the mainstream, as Food is a far more populist topic than fashion. As one of Pinterest’s newer (and less trendy) members, I confess that I’d be more likely to pin about cake than couture.

Most Viral Categories

We looked at the average number of repins and likes for the pins across the different content categories we identified.

Check out the average repins by category below:

Hands down, Food was the most likely category to be repinned. This could explain its increasing share of the category landscape. Style and Fashion is a distant second, and Holidays/Seasonal was the least likely category to be repinned.

Interestingly, the most liked categories were NOT the same as the most repinned. Check out the average number of likes per pin by category:

Here we see that Products is actually the most likely category to be liked. This is surprising, particularly given that Products sat in the middle of the pack when it came to repins.

Humor is a similar story, ranking second most liked but not standing out based on repin frequency.

Conclusion

With Pinterest continuing to explode in size, it will be exciting to see how broader adoption shapes the voice of its community.

Pins about the home, arts and crafts and inspiration continue to dominate the content landscape, but the growing popularity of more broadly-accessible topics like Food and Product Recommendations could indicate that Pinterest is heading toward a more mainstream and commercialized future.

Charity Metrics

Since my first internship in college, I have made an annual donation of $5,000 or 10% of my post tax income, whichever is greater, to a charity called the Smile Train. The Smile Train performs corrective surgery on children born with cleft lips and cleft palates in developing countries. The surgery costs about $250 per person, and it makes a huge difference in the life of the people born with clefts. Uncorrected clefts result in lifelong problems eating, speaking, and breathing. Smile Train had some drama this year when they announced (and later reverted) plans to merge with another cleft charity. I don’t know if this impacted Smile Train’s effectiveness, but the shakeup motivated me to re-evaluate my choice of charity, something I had been planning to do for a while.

I wanted to base my decision on charity metrics. Using my fixed charitable budget, I want to get the highest possible return in terms of lives saved or drastically improved. I don’t give weight to the gender, ethnicity, or location of the recipients. I’m also not interested in spreading it around. I want to pick the best charity and give 100% of my donation to it. For an explanation of the rationale behind donating to a single charity, read this article by the economist Stephen Landsburg.

One metric that I knew not to use, or at least not exclusively, is ‘efficiency’. This is the percentage of budget that goes to program expenses, and it’s used by a number of charity rankings like Forbes’ annual charity list. While the efficiency ratio is useful for ruling out charities where much of the budget goes into management’s pockets, it does not account for the big differences in return-on-donation between different charities.Like any self respecting nerd, I tried to come up with a formula and make a spreadsheet. I wanted to compare charities on an apples to apples basis in order to figure out where my dollars would have the biggest impact. My initial attempt was based on the following variables:

  • U = cost per unit (For the Smile Train, this is a surgery on one person)
  • L = number of required lifetime units (An operation might be done once, medication may be require many doses over a lifetime)
  • E = efficiency (Percentage of the budget goes to the actual cause as opposed to overhead)
  • S = Odds of improving and/or dramatically improving a life (Vaccines are given to some people that would not have caught the disease anyway)

Cost per person helped = S * U * L / E

For the Smile Train, This worked out to be 1 * $250 * 1 / .822 = $304.

I ran into two problems with this formula. First, I could not find a scalable way to get inputs for large numbers of charities. The second issue is that a crucial variable was missing from my formula. What’s the difference in impact between preventing blindness versus increasing school attendance versus fixing a cleft palate? My formula treated all afflictions as equally detrimental. While that didn’t seem right to me, I wasn’t sure how to capture the differences.

While I was researching the inputs to my formula, I found two fantastic resources for optimizing charitable giving, both of whom had done much of the leg work already.

Giving What We Can (GWWC) was started by a philosophy professor at the University of Oxford named Toby Ord . GWWC makes charitable recommendations by trying to optimize a metric called disability adjusted life years, or DALY. This is a much better version of what I was trying to get at with my spreadsheet. DALY is an increasingly popular metric to compare the burdens of disease and various treatments. It’s used by the World Health Organization, World Bank, and others. GWWC also provides a wealth calculator to give you a sense of where you stand in terms of wealth compared to most of the rest of the world. If you’re reading this, I bet you are richer than you think.

GWWC has determined the best charitable bangs for your buck are from curable and preventable diseases in the third world. Their three recommended charities are:

The other resource I found is GiveWell.org. GiveWell was started by two former financial analysts named Holden Karnofsky and Elie Hassenfeld. GiveWell also uses DALY, but takes a somewhat more nuanced view to the charity optimization problem.

GiveWell uses DALY to filter down to the set of best philanthropic investments, but considers the error bars on DALY estimates to be too high to make a precise ordering. Once the universe of potential charities has been winnowed down to the highest-return organizations, GiveWell does proprietary research in an attempt to forecast changes in the effectiveness of incremental donations. They interview management teams and find out about the projects they will fund if their budget increases. GiveWell reevaluates its top charities each year, requests updated information and interviews from the respective organizations, and updates its recommended charities list.

While there are differences in the methodologies of these two organizations, they come to similar conclusions. GiveWell’s two recommended charities both show up in GWWC’s list. In fact, GWWC recently updated their site based on research that GiveWell conducted. GiveWell’s top two recommendations are:

  1. Against Malaria Foundation
  2. Shistosomiasis Control Initiative
I decided to make my annual donation to the Against Malaria Foundation. GiveWell’s methodology of attempting to forecast the biggest DALY payoff for incremental dollars invested is, in my opinion, the optimal way to choose a charity. Shortly before posting this article, I funded my donation to AMF. I’m extremely happy to report that I will be responsible for the distribution of over 1,200 mosquito nets. I encourage you to apply metrics to your charitable donations as you would for a financial investment. Or, take my word for it and donate to the AMF now.

Changes coming to your RJMetrics dashboards

Following up on the new settings page, we are hard at work on improvements to the dashboards and charts in RJMetrics. We wanted to outline a few of the changes ahead of time so that you know what to expect. We are focused on making RJMetrics the best way for online businesses to get actionable insight from their data, and the new features reflect that. We are also going to retire two of our lesser-used features so that we can put more focus on the areas that matter most.

Improvements

  • Easier ad hoc analysis – see the impact of your changes right away, no need to save or press preview
  • Single page chart editor – view all of your chart configuration settings at once
  • Eliminating the need for composite charts – anything that you need a composite chart for today will be doable with the new version of the standard chart editor
  • Drag and drop chart creation and editing – start viewing your data without configuration and get moving fast
  • Micro-editor for quick changes – no need to open the full chart editor for changes to the date range
  • Improved visualizations – easier to read and powered by javascript rather than flash

Retired features

We will continue to support both of the features below until September 15. After that, charts that use either will be removed from the dashboards. As a reminder, you can export the visualizations or data behind your charts if you want to hold on to this data.

Twitter – Social media tracking is valuable, but it was never our focus. We have gotten feedback from clients that Twitter alone is not very useful, and adding other social media services is not in our road map. We recommend checking out Argyle Social, Awe.sm, Radian6, and uberVU as great options for tracking and analyzing your influence through social media. For managing your various social media accounts, you can’t beat our client HootSuite.

Rank over time – There are actually two separate ranking features in RJMetrics, and we are removing the one that is very rarely used. You will be still be able to show the top 10 categories or the bottom 5% of users as described in this help center article. However we are removing the ability to choose a specific element and plot changes to its rank over time.

We hope you are as excited as we are about the upcoming improvements to RJMetrics.

Six Metrics Every Business Should Track

At RJMetrics, I’m lucky to work with smart people at successful companies to help them analyze mountains of complex data. What I find remarkable is how many of the same metrics are consistently relevant to companies across all sizes and industries.

Today, I will explore six such metrics that are related to customer retention and loyalty. If you’re not already tracking these metrics for your business, I suggest you start. Metrics like these should be closely watched and can help inform major decisions around marketing, customer retention, product development, and more.

Preparation: Define Users and Actions

Before we start, answer these two questions: who are your users and which of their actions matter?

The first part should be easy. Your purchasers, members, subscribers, or visitors are your company’s lifeblood. Depending on your industry, the second part could be a bit subtler. In e-commerce, the obvious “action” is a purchase. In social media, that action might be a login or user interaction. For publishers, it may be a visit or page view. As a rule of thumb, this action should be an undeniable indicator of value; users who do it more should be more valuable to your company.

With these two definitions in place, we are ready to do some analysis.

Metric One: Engaged Users

Usually, getting more users means increasing the value of your business. However, simply looking at “total users since the dawn of time” is never enough. Quality outweighs quantity when it comes to building long-term value.

As we saw in our analysis of Twitter’s Data, “total users” can be a tremendous overstatement of another metric that actually means much more: “engaged users.” This metric examines the number of distinct users who have committed an action in any given time period. This is your real customer base, and it’s directly tied to the value of your business. Appropriately, it’s also the population that savvy investors and acquirers will consider when determining a valuation.

It is also worthwhile to examine how this number has changed over time and in proportion to your total user base. What percent of your total user base (any user who has ever committed an action) came back to act again last month? Yesterday? How has that proportion changed over time? The direction of that chart can indicate how the quality of your average customer is evolving.

This chart shows active customers as a percent of the total customer base. The trend starts at 100% in the company’s first month, then rapidly stabilizes (with some seasonality)

Metric Two: Repeat vs. First-Time Actions

To build on our study of engaged users, we need to distinguish between user acquisition and user retention.

Imagine a social network in which users sign up, commit one action, and then never return. If this network was able to double the number of new signups it received each month, its “engaged users” chart would actually look quite impressive. However, a quick look at a chart of “first time actions” vs. “repeat actions” would allow us to quickly see through the façade.

If each action generates value for your company, this metric allows us to view the relative value creation from new and existing users. If this ratio is biased toward new users, you might soon hit a wall. If it’s biased toward existing customers, you may have already hit one.

Metric Three: Time Between Actions

Once we’ve identified a universe of “repeat users,” we can gain more insights into their behavior by studying how much time passes between the average user’s actions.

An e-commerce site might see an average value of 75 days. A gaming company might see an average value of 15 minutes. It can be valuable to see how this value changes over the life of your business and over the life of a given customer.

For example, looking at the “average time between first and second action” for users registered in each month of your company’s life helps you determine if you’re getting better at retaining new users. Similarly, comparing the “average time between first and second action” to the “average time between second and third action” (and so forth) can help you determine when and how to remarket to existing users.

Here, the “next purchase number” is shown on the x-axis. For many businesses, the average time between purchases drops with each subsequent purchase.

As with all of these metrics, examine how these numbers differ by customer segments (based on anything from demographic information to behavioral tendencies to acquisition channel). The results might cause you to act differently when working to attract and retain customers.

Metric Four: Repeat Action Probability

This metric is a study of the “action funnel.” For each user who acted once, how many acted a second time? A third?

I like to look at this in two ways. The first is as a count of actions by action number, illustrating the steepness of the funnel. The second is as a “probability” based on historical data, illustrating how each action impacts the likelihood of the next. (Of course, you should beware of interpreting this as an actual probability if you don’t have a lot of historical data.)

Here, “purchase number” is shown on the x-axis. For many businesses, each incremental action makes a subsequent action more likely.

Many of our customers are surprised when they see their data displayed this way. A steep drop-off from action-to-action is quite common, as is a very large increase in the repeat probability from action to action. The take-home message: loyalty snowballs quickly, but most users never start rolling at all.

Metric Five: Customer Lifetime Value

You’ve probably heard this term before, and rightfully so. Customer lifetime value is specific to each customer and it allows you to identify just how valuable different customer segments really are.

If your “action” is something binary like a login, the value this metric tracks may be a count of those actions. However, if it’s an action tied directly to a value like revenue or gross margin, such as a purchase, it is likely the sum of those action’s values.

To many, customer lifetime value is more than the amount of value generated by a customer so far. It can be expanded to include a projection of subsequent value a customer is projected to generate. Conducting this calculation can be complex, however—I’ll leave that for another post. For now, you can use “value generated so far” as a good proxy when comparing users with similar first action dates.

To examine this metric, calculate it for every customer and then segment those customers as you see fit. This can be a great jumping-off point for identifying customer segments who are performing well (so you can acquire more like them) and those who are underperforming (so you can find out why and reverse the trend).

Metric Six: Cohort Analysis

If I was stuck on a desert island and could only take one chart, it would be a cohort analysis.

The cohort analysis groups users into “cohorts” based on the time period in which they committed their first action (and/or other available attributes). Then, it charts the value of each cohort’s actions in each subsequent month of their lifetime as users.

Several cohorts are typically shown on the same chart, allowing for a layered view of how these cohorts perform in general as well as relative to one another over the lifetime of your business.

Since most businesses see a dropoff in actions after the first period and there may be huge variation in the number or value of actions from cohort to cohort, the most consumable form of a cohort analysis chart shows each data point as a percent of the first period’s value. These charts typically exclude the first month, since by definition that value is always 100% for each cohort.

A Cohort Analysis can incorporate elements of all the other metrics discussed in this post.

For a more detailed explanation, check out our Cohort Analysis website. I include it here today because it’s the one chart that incorporates the valuable information explored in each of the other five metrics.

Conclusion

These six metrics are at the core of some of the most powerful analyses conducted by the world’s largest and most successful businesses. Advances in technology have made them accessible to companies of all sizes, and products like RJMetrics allow businesses to monitor them with minimal effort.

While business models differ, a core objective is often the same: creating value. Tracking these metrics can empower any company to better understand their customers, generate greater value, and increase their chances of success.

RJMetrics Feature Spotlight: Analysis by Age

Today, we’ll be taking a look at another RJMetrics analytical tool: dynamic age calculations. RJMetrics can calculate the “age” of any date stored in your system, providing helpful look at how much time has passed since a particular stored date. This can be useful in a number of situations, including:

  • Studying the age of your user base (when their birth date or birth year is collected).
  • Studying the amount of time since a particular event, such as a user’s first purchase or most recent subscription payment.
  • Examining negative ages (the time until a future event), such as a graduation date or expiration date.

For this example, let’s take a look at the fictitious company Play Now (an online gaming site). We will build a chart by selecting the trend ‘Average User Age’ in step 1 of the chart builder, and then we’ll group the data by “referrer” in step 3. This results in the chart below:

As you can see, the average customer’s age varies significantly based on their referral source. Sites like aol.com are referring the youngest users, while the site “gamesite.com” is referring the oldest. This information could obviously be very helpful in combination with statistics like acquisition cost by channel and conversion rate by age. It could also be used by a marketing department to make sure the right messaging is used in each channel. Note that exporting the data behind this chart will result in a data set shown in seconds. This allows for the greatest possible granularity and can always be converted to other time units with some simple equations.

If you’re interested in learning more about RJMetrics, check out our website where you can learn more and try out a free demo.