Obligatory Plug (let’s get this out of the way): I’m the co-founder of a company called RJMetrics. We develop hosted software that helps online businesses make smarter decisions using their data. I used RJMetrics to do all of this analysis and only scraped the surface of what our tool can do. If you’d like to see what RJMetrics can do for your business, sign up for a free 30 day trial. OK, onto the good stuff…

A few days ago, I was lamenting to my co-founder Jake about a frustrating problem: my blog content had stopped making it to the front page of Hacker News. While my posts are admittedly formulaic (I usually get my hands on some never-before-seen data and analyze it in RJMetrics), they always seemed to work their way to the top.

But lately I’ve been coming up dry. My TechCrunch guest post on how start-ups approach patents? Nah. My piece on never-before-seen Pinterest data? Fail. How about new data on the behind-the-scenes world of VC deal sharing? Another bomb.

I had some self-serving theories: Hacker News had devolved, succumbed to voter rings, or maybe just become too mainstream. Jake, as he often does, offered up alternative theory: my content sucks.

Jake proposed that the content landscape has become more competitive as HN has grown and that my content hasn’t improved fast enough to keep up.

As with most of our arguments, we decided to let the data decide. I used ThriftDB’s HNSearch API to pull down a complete history of Hacker News submissions, comments, and scores. I then plugged the data into an RJMetrics Dashboard and went to work answering some questions about the evolution of community, content, and competition on Hacker News.

Read on to see the data behind findings like these:

  • On Hacker News, the rate of new user registrations grew explosively in 2010, was flat in 2011, and is down in 2012.
  • The total number of active users continues to grow because a high percentage of historical users continue to participate on HN even years after their initial registrations.
  • Despite growth in the user population, the number of submissions made to Hacker News each week has held steady since 2011.
  • If you want upvotes, use profanity and talk about hot startups. Steer away from big companies and sensationalist headlines.

Registered Users

The population of registered Hacker News users has grown considerably over time, as shown in the chart below. However, new user registrations flattened out around 10,000 new users per quarter in 2011 and appear to actually be slightly lower so far in 2012.

I was surprised to see this decline in growth rate, which goes against the argument that Hacker News has gone too mainstream. To me, the recent flatness suggests a market saturation point. If HN’s userbase is bounded by the number of new “startup tech enthusiasts” arriving on the scene each year, its base may not be changing much after all.

Active Users

As we all know, registered users don’t tell the whole story. It’s the active users that make up a community. Below is the number of users who performed at least one action by quarter (note that these are limited to submissions or comments because I don’t have access to user-specific voting data).

Currently, about 30,000 users submit articles or comments per quarter, up about 7x from the levels of late 2008. However, this number is not climbing very quickly. In the last quarter for example, despite about 7,700 new registered users, the number of users who submitted an article or comment increased by only about 600.

This suggests that there may be a large number of new registrations who are DOA, only participate as readers/voters, or that there is drop-off in user activity over time. A cohort analysis will shed some light on this for us.

Hacker News Cohort Analysis

I pulled this out of RJMetrics in about 30 seconds. The chart below shows, for the “Q1″ cohort of each full calendar year since 2008, the number of registrants who submit or comment in each quarter of their lifecycle.

There are a few noteworthy trends here:

  • Consistently, about 75% of the users who register will submit an article or comment in their first quarter as a user.
  • In the second quarter, the number consistently drops to the 30-40% range.
  • In later months, the participation range stabilizes around 20-30%, but there is a clear distinction between each year’s cohorts. By about 2 years out, the active user percentage of each year’s cohort is 3-5% lower than the previous year’s cohort.
This chart is remarkable for a number of reasons. While it’s clear that the engagement of the average new user is declining over time, I think that the more unusual (and impressive) take-away is that such a consistent percentage of registered users return each quarter, even years after their original registrations.
If anyone out there suspected that the “old guard” had given up on HN, this chart proves them wrong. The number of users from my 2008 vintage that are still using the site is actually holding quite steady around the same level it was at in 2009 and 2010.

Highly Active Users

Doing all of these analyses by quarter tells us about users with a minimum level of activity, but it doesn’t tell us much about “very active” users. As you might imagine, with each activity a user conducts, she is more likely to conduct another.

To see if "hyperactivity" is a trait that has increased or decreased with new cohort of users, we looked at the percent of users by quarter who performed at least 30 actions in their first 90 days after registration.

As you can see, since 2009 the percentage of new users that meet this threshold of being “highly active” has dropped from around 4%-5% to around 2%. When combined with the cohort analysis, we start to see a picture of an average user who is active but less deeply engaged with each new cohort.
At this point, I was growing concerned that Jake might be right. These trends were concerning but far from damning. To get the real answers, I was going to have to turn my attention to the content that was beating me to the front page.

Number of Submissions and Upvotes

With a fixed 30 slots available on the HN homepage, it becomes statistically less likely to make it to that coveted top-30 slot with each new submission that comes in. Amazingly, however, the number of submissions to HN in the past two years has been… well… flat.

Despite growth in the user population, the number of stories competing for the top spots each day has held steady in recent history. I think this again speaks to market saturation– there are only so many stories that are relevant to this community. (And spammers have learned that flooding the community with off-topic links doesn’t yield page views.)

Interestingly, if you look at the number of upvotes cast each day, the trend is similar. For the past two years, the same number of stories have been competing for about the same number of votes each day.

I no longer buy Jake’s argument that the volume of competition has increased. The only question left is which has gotten worse: my content or the community’s taste?

Post Performance By Category

As a next step, I decided to categorize the posts in the database into buckets based on key words that appeared in their titles. I measure “popularity” by the average number of points earned by submissions with these words in their titles.

I chose to categorize content by the mention of things like big companies (i.e., Amazon, Google), Hot Startups (i.e. Pinterest, Instagram), Sensationalism (i.e. Best, Worst, First), Programming Languages (everything I could think of), and Profanity (which was fun). Note that not all content on HN falls into one of these categories and that the overall average score for any post is about 11 points.

Apparently, if you want to write a popular article you should avoid sensationalism and be sure to swear in your title.

Unfortunately for me, however, the tastes of the Hacker News community have largely held stable over the past four years. It’s hard to make an argument that there has been a cataclysmic shift toward embracing sensationalism or a deviation from the core focus on technical content.

Post Performance By Content’s Domain Name

I looked at domains that hosted at least 20 submissions in 2012 and ranked them by average number of points per submission. The top 20 are below.

I was also curious how the domain names of popular news sites performed. The results were a bit surprising.

As you can see, content from mainstream tech blogs like TechCrunch and PandoDaily perform about average, while blogs like Mashable and Business Insider perform extremely poorly. Also interesting is the enormous gap between the New York Times, whose content tops this list, and the Wall Street Journal, whose content performs among the worst. This speaks to the quality of the average piece of content from each of these news sources (at least in the eyes of HN community).

What Have We Learned?

I think this exercise can be summed up in a few simple conclusions:

  • By the numbers, Hacker News hasn’t changed much in the past two years. New members compensated for attrition, the most passionate users haven’t left, and approximately the same number of submissions and votes happen every day.
  • Specific companies come and go, but the community cares about programming, startups, and controversy.
  • Once you’re hooked, you’re hooked. Users still participating in the community 6 months after joining will most likely still be participating years later.

So, why has my content had such a rough time making it to the front page? It’s because my content hasn’t been tailored to my audience. (In other words, I think I owe Jake five bucks.)

 

What Can I Do Better?

Remember those three posts of mine that bombed? One about Pinterest data, one about software patent cohorts, and one about VC collusion? Here are some facts:

  • HN doesn’t care about Pinterest. The average submission with “Pinterest” in the title has received 6 just points, one of the lowest average scores out of any company name I investigated.
  • As a buzzword, cohort analysis peaked last year. It had an average score of 13 in 2011 but only averages 5 points in 2012.
  • While things like acquisitions rack up the points, “venture” in general just doesn’t grab attention like it used to, earning an average score of only 7 points.

As it turns out, following my formula for writing posts only gets me so far. I’ve been picking the wrong subject matter to hit it big on Hacker News. I need a little sizzle with my steak, but the keys to success aren’t any different than they ever were.

Want to perform this kind of analysis (and much, much more) on your company’s data? Click Here to try RJMetrics free for 30 days.

 

Love exploring the world with data?

So do we. Our data journalism has been cited by The New York Times, The Wall Street Journal, and Fast Company.


  • http://www.amerika.org/ Brett Stevens

    The problem with all social media is that over time, it becomes dependent on a core group of users and thus becomes a hive-mind.

    This group then decides what it wants reality to be, and violently excludes all other suggestions. By wanting to emphasize its unity, it becomes a group-thinking self-referential function.

    • Anonymous

      Agreed! This totally describes HN, much more so than slashdot.

  • http://www.eytanlevit.com Eytan Levit

    This is one awaome blog post, kudos. Will try your app once i get near my PC(currently on the go).

  • Pingback: Rearranging News Feeds | SkimFeed.com All The News Blog

  • http://projectit.com Sailfish

    A lot of good research and reasoned analysis.

    Nice work.

  • http://liveditor.com Edwin Yip | dev of LIVEditor

    At least one forth of the Hacker News make me exciting, I don’t know why, maybe I’m a geek ;)

  • Gabriel

    It doesn’t help to get upvotes when i can’t see any graph in Android…

  • http://www.lemiffe.com lemiffe

    Well… It seems like you have made the front page again ;)

  • Pingback: Thought this was cool: Surprising Hacker News Data Analysis | RJMetrics Blog « CWYAlpha

  • http://sundarsubramanian.com sundar

    The repeat action graph is a very interesting one. I was very surprised to see 60% of users doing at least 1 repeat action

  • http://songz.me Song Zheng

    Hey I think you should use a different commenting system like Disqus or something of the sort!

  • http://twitter.com/zjelveh zubin

    Difference between NYT and WSJ probably has more to do w/ WSJ’s stricter pay wall.

  • touristtam

    nice post. been reading anonymously NH for 2 years now. I don’t see any metrics on “my” kind of user. I guess you had no data though.

    Tam

  • Zane

    Excellent data driven post. Good stuff

  • Zac

    You’re failing to mention that the Register button was taken off the main page, and removed from the basic login page.

    If I recall correctly, the only way to register now is to either attempt to submit a story, or attempt to make a comment.

  • Pingback: The Hacker News Slap | Thomas Park

  • Andrus

    Great post, and love how you’ve tied the topic very neatly to what your company does. Content marketing (for lack of a better term) at its best.

  • http://jcla1.com Joseph Adams

    Great post, really enjoyed reading it.
    You mention that you used the ThriftDB API to get a full history, would be interested to see how you did it, since I failed trying…