Nomorerack.com Makes Smarter Decisions with RJMetrics

The post below was submitted to us by nomorerack, a fast-growing online shopping destination with an avid team of RJMetrics users.  To see what RJMetrics can do for you, get started with our 30 day free trial today.

At nomorerack.com, our goal is to be the go-to online shopping destination for those who want quality brand name apparel and accessories for up to 90% off retail. A key to achieving that goal is having a deep understanding of our customers’ behavior.

In this post, we outline our methods for maintaining a consistent, deep understanding of our customer base that evolves with our data.

Quest for Customer Insights

Our long-term success is strongly dependent on client satisfaction. We’re focused on making sure that our customers keep coming back, refer their friends and help our community grow.

To better understand our customer base, we wanted to use important metrics like revenue per user (RPU), time between purchases, and cohort analysis. It was critical to us that we be able to access these metrics on-the-fly as our data changed and segment them by things like acquisition source. Understanding the returns we see from different channels is critical because it tells us which avenues are most effective and where we should be directing our resources.

To address these needs, we went looking for an analytical tool that allows non-technical team members to pull frequently updated reports and run queries via a simple user interface. It was also important that we get up and running as quickly as possible. We looked to the cloud.

Cloud Business Intelligence

A quick search led us to RJMetrics, which provides hosted data analytics software. We reached out to them and signed up for a free 30 day trial in which we asked to measure those key metrics like RPU, lifetime revenue (LTV) and repeat purchase patterns.

Vishal Agarwal, our Director of Business Development, signed up for RJMetrics on a Friday and was running these critical reports by Tuesday of the following week. By Thursday, our whole team was trained on RJMetrics’ system. Within a week of signing up, we were already saving many hours that were previously spent on report generation and data exploration.

Another unexpected plus came as a result of RJMetrics’ experience in working with e-commerce companies like ours. RJMetrics has developed a suite of best-practices metrics that are readily available out-of-the-box. Through cohort analysis, we are able to group customers by their registration dates and analyze their subsequent purchases over time on a single chart. This exercise was brand new to our team and would have taken us hours to build in Excel.

We knew this subjectively but the cohort chart confirmed that we had an amazingly loyal customer base. Customers acquired in November 2010 have continued to spend the same amount with us month on month, right till date. This was extremely encouraging evidence that our customers love our products and are far more valuable than just the amount of their first purchases.

While we were very focused on acquiring new subscribers, what was very surprising that 70% of our revenue always came from existing customers.

RJMetrics also helped us optimize marketing dollars. Their “repeat purchase probability” and “average time between purchases” metrics helped us in planning email triggers and targeting specific audiences within our user base. We also learned that only 5% any given day’s sales came from users who registered on the same day, which encouraged us to place increased focus on converting new users.

To share these metrics internally, we leveraged the “syndicated dashboards” feature in RJMetrics. This feature allows us to share common dashboards such as “sales,” “supplier” and “marketing,” with different teams internally. This way, management can clearly communicate with key teams through one set of metrics. No more exchanging multiple emails with messy excel spreadsheets and end of the day reports.

 New Insights Every Day

Once we started digging into our data using RJMetrics, we realized that its scope is much wider than just calculating cohorts or LTVs. RJMetrics became a one stop shop for all of our data needs – from basic revenue reporting to the more complex analysis of ancillary data sets.

The beauty of RJMetrics is that it can incorporate any data that lives in our backend database. Every time we start tracking a new data field, RJMetrics can incorporate it into our hosted data warehouse, and we are able to start charting it in a matter of hours. For example, we just started analyzing customer surveys and not only are we able to analyze customer satisfaction and chances of repeat purchase, but we are also linking this data to its respective products and vendors. This allows us to measure company performance through suppliers, products and deal campaigns. In other words, our customers are now actively defining what we sell.

Conclusion

We chose to rely on a third-party service to enable the analysis of our backend data and we are thrilled with the results. Rather than re-invent the wheel, we left it to the experts at RJMetrics and have been able to reap the benefits extremely quickly.

Airbnb Data Analysis: 6 Million Users by Year-End, Only 20% Active

Airbnb is one of the hottest sites on the internet. The Y Combinator graduate has raised $120 Million of funding to change the way people find places to stay around the globe.

As fans of Airbnb with a passion for startup data, we decided to try and learn more about the site’s user base by looking at the publicly-available profiles of its members. We sampled just over 60,000 users and were able to draw some interesting insights using an RJMetrics business intelligence dashboard.

Some highlights include:

  • Airbnb has over 2.1 million registered users and is growing about 250% year-over-year. At this rate, they’ll have 3 million users by the end of June and 4 million by the end of August.
  • Almost 85% of Airbnb’s userbase has never received a review as a host or a guest. Our sample suggests that there may be as few as 350,000 reviewed users among the userbase of over 2 million.
  • Usage is addictive — with each additional stay booked through Airbnb, users become increasingly likely to book again.

User Growth

Since Airbnb uses auto-incrementing IDs for its users and does not appear to have skipped any range of ID values, it is quite easy to track user growth over time.

Airbnb has very seasonal growth patterns, with most new users signing up in the summer (peaking in August) and significantly fewer signing up in the winter months (reaching a low point in December). These user analytics were easily extracted using RJMetrics.

The current user count is approximately 2.1 million.

For the past several months, the year-over-year growth rate has been steady at around 250%. Extrapolating this out for the rest of the year puts the site’s user count at over 3 million by the end of June and over 4 million by the end of August. At its current growth rate, the site will approach 6 million registered users by the end of 2012.

Usage

Since we didn’t have direct access to data on actual stays, we used reviews as a proxy for activity. Reviews are the lifeblood of the Airbnb community, so we think it’s fair to assume that the number of reviews is a good proxy for the number of stays.

Most sites we study show signs of the “80/20 rule,” which suggests that 80% of the activity comes from 20% of the users. In Airbnb’s case, it’s more like the “100/20″ rule — only 16% of the user base has been reviewed as a host or a guest.

Here are some other usage statistics:

Only about 14% of users (or about 300,000 users) have been reviewed as guests.

Only about 2.3% of users (or about 50,000 users) have been reviewed as hosts.

A mere 0.5% of the userbase has been reviewed as both guest and host.

5% of users (or about 100,000 users) have active listings, but only 2% (or about 40,000 users) have received reviews from guests. This suggests that more than half of the people listing properties have yet to host a guest.

Repeat Activity

By relying on the same techniques we use to track repeat purchase probability in RJMetrics, we were able to profile the average user’s likelihood of using Airbnb with each additional stay.

As you can see, while only about 14% of the userbase ever books a stay (as indicated by a first review from a host), 22% of those users who book once go on book a second stay via Airbnb. By the time a user has booked five stays, the likelihood that they will book another stay on Airbnb is over 50%.

Note that these percentages are based on the behavior of the existing user population, the majority of which has been registered for less than a year. Since so many users have a limited history on the site, it’s quite likely that these numbers will increase over time.

Conclusion

Airbnb continues to explode in popularity and experience tremendous user growth as a result. As with most consumer sites, however, the population of active users is much smaller than the total registered user count.

Airbnb’s key to continued success will be to both grow its user base and convert more of its registered users into paying customers. As we’ve seen, with each additional booking users become more likely to book again.

Click here to use RJMetrics to draw actionable insights from your company’s data.

Why You Shouldn’t Start The Next Instagram

So, as you’ve heard, Facebook has acquired Instagram for $1 Billion worth of cash, stock and PBR. That’s more than the New York Times is worth.

Since the announcement, the general conversation is this:

  • Instagram isn’t worth $1 Billion since it has no revenue.
  • Instagram is worth $1 Billion because it’s so much better than Facebook at photo sharing. See Robert Scoble’s arguments.
  • I should start a company that Facebook would want to buy.

Let’s focus on that last one. I’ve ranted about this before, but to put it frankly, we’ve got bigger fish to fry. I mean that as a society.

I think it’s relatively safe to say that the photo-sharing problem has been solved. There’s 1001 real problems on this planet that have no solution.

So, I’m proposing something different. Before starting a business or joining a startup, ask yourself if that startup is solving a problem. And I don’t mean a “this photo could use more filters” problem. I mean a real, “this will actually make people’s lives better” problem.

Start the next Dollar Shave Club, not the next Instagram.

I’ll be at Startup Weekend Philadelphia on April 20th. I’ll help you start something real.

New Pinterest Data: What’s Everyone Pinning About?

Last month, we released an in-depth report on Pinterest user behavior. The general theme among our many findings was that Pinterest users are deeply engaged and remain highly active over time.

This led many people to ask a logical follow-up question: what exactly is everyone pinning about? With about a million pins still sitting in an RJMetrics hosted data warehouse from our previous report, we decided to answer that question.

Our full report is below, but here are some of our key findings:

  • The all-time most popular pinboard categories are Home (17.2%), Arts and Crafts (12.4%), Style/Fashion (11.7%), and Food (10.5%).
  • Good news for e-commerce players looking to cash in on Pinterest: Products I Love is the third most popular pinboard name, and pins that exist on pinboards about Products are the most likely to be liked by other users.
  • Food is the fastest-growing pinboard category. Food is also the most likely category to be repinned, on average generating over 50% more re-pins than the next most repinned category, Style and Fashion.

How We Did It

We sampled just under one million pins from pinterest.com by browsing Pin ID Numbers and usernames from the general population. These pins were from about 9,200 unique users and were stored across pinboards with about 15,000 unique names.

We mapped pinboard names to content categories by identifying the most commonly used words and phrases and mapping those to specific content categories. This allowed us to categorize the vast majority of the pinboards we identified into a handful of distinct categories.

As always, RJMetrics was our secret sauce. We conducted the analysis for this article with just a few clicks from our RJMetrics online dashboards.

Most Popular Board Names

Most pinboard names are used by many different users throughout Pinterest. In fact, some board names represent as much as 3% of the total pinboard population. The 10 most common board names, along with the percent of boards they represent, are listed below:

  • For the Home (3.15%)
  • My Style (1.97%)
  • Products I Love (1.86%)
  • Books Worth Reading (1.68%)
  • Food (1.23%)
  • Favorite Places & Spaces (1.00%)
  • Recipes (0.75%)
  • Craft Ideas (0.74%)
  • Christmas (0.72%)
  • Crafts (0.65%)

Those who are excited about Pinterest’s use as a product recommendation engine will be happy to see Products I Love as the third most popular board name, representing about 1.9% of pinboards.

Most Popular Board Categories

In addition to the popular names listed above, there are thousands upon thousands of additional unique board names in our sample. By isolating key words and phrases, we were able to bucket about 85% of the pinboards from our sample into categories.

The top 10 most popular categories are listed below:

  • Home (17.2%)
  • Arts and Crafts (12.4%)
  • Style/Fashion (11.7%)
  • Food (10.5%)
  • Inspiration/Education (9.0%)
  • Holidays/Seasonal (3.9%)
  • Humor (2.1%)
  • Products (2.1%)
  • Travel (1.9%)
  • Kids (1.8%)

Note that over 60% of pinboards fall into the top 5 categories.

Emerging Trends: Food Trumps Fashion

If we look at these top categories over time, by and large it appears as though their relative percentages are holding steady.

As you might expect, the only line showing some degree of seasonality is the one related to holidays and seasonal content.

As a next step, however, we isolated the two categories that had shown the biggest changes in market share over the past 12 months and uncovered something interesting:

In the early days of Pinterest, Style and Fashion represented twice as many pinboards as Food. However, in recent months, Food has gained more ground than any other category, and has actually become more popular than Style and Fashion among new pinboards created.

This could be a reflection of Pinterest’s increasing popularity in the mainstream, as Food is a far more populist topic than fashion. As one of Pinterest’s newer (and less trendy) members, I confess that I’d be more likely to pin about cake than couture.

Most Viral Categories

We looked at the average number of repins and likes for the pins across the different content categories we identified.

Check out the average repins by category below:

Hands down, Food was the most likely category to be repinned. This could explain its increasing share of the category landscape. Style and Fashion is a distant second, and Holidays/Seasonal was the least likely category to be repinned.

Interestingly, the most liked categories were NOT the same as the most repinned. Check out the average number of likes per pin by category:

Here we see that Products is actually the most likely category to be liked. This is surprising, particularly given that Products sat in the middle of the pack when it came to repins.

Humor is a similar story, ranking second most liked but not standing out based on repin frequency.

Conclusion

With Pinterest continuing to explode in size, it will be exciting to see how broader adoption shapes the voice of its community.

Pins about the home, arts and crafts and inspiration continue to dominate the content landscape, but the growing popularity of more broadly-accessible topics like Food and Product Recommendations could indicate that Pinterest is heading toward a more mainstream and commercialized future.

Pinterest Data Analysis: An Inside Look

Pinterest is the hottest young site on the internet. In the past six months, the social sharing tool has gone from effectively non-existent to one of the top 100 sites on the web (and is on track to break into Alexa’s Top 50).

Pinterest’s traffic charts aren’t hockey sticks– they’re rocket ships. In our experience, when traffic is growing that sharply there is often something even more amazing going on under the hood. We wanted to see if the usage and engagement numbers for Pinterest were as remarkable as its traffic and gain insights into exactly what was driving growth. Unfortunately, the company has kept very quiet when it comes to its data.

Tired of waiting, we took things into our own hands using some clever scripting and our secret sauce for analytics: RJMetrics. A full report is below, but here a few highlights from our findings:

  • Pinterest is retaining and engaging users as much as 2-3x as efficiently as Twitter was at a similar time in its history.
  • Pins link to a tremendously large universe of sites. Etsy is the most popular source of pin content, but it only represents about 3% of pins.
  • Over 80% of pins are re-pins, demonstrating the tremendous virality at work in the Pinterest community. To contrast, a study done at a similar time in Twitter’s history showed that only about 1.4% of tweets were retweets.
  • The quality of the average new user (as defined by their level of engagement and likelihood to remain active) is high but declining. Users who have joined in recent months are 2-3x less active during their first month than the users that came before them.

How We Did It

We wrote some simple scripts to identify random users who joined at varying times in the company’s history and download their complete history of pins to conduct cohort analysis. We also pulled several hundred thousand additional pins from the general user population. All told, we ended up with a database of nearly one million pins.

Thanks to our old friend the central limit theorem, we’re confident that our sizable random samples are representative of the greater population they were pulled from. We should caveat, however, that there is always a risk of sampling bias. Since Pinterest doesn’t use auto-incrementing IDs, we had to get creative about identifying random users and pins. We identified user names based on common dictionary words and then expanded to general-population pins by guessing at ID numbers in numeric proximity to the pins of those core users.

We loaded this raw data into RJMetrics and were able to conduct the following analysis in about 15 minutes. If you’d like to give it a try with your own company’s data, RJMetrics is offering free 30-day trials for a limited time.

Content: What’s Being Pinned?

On Pinterest, every pin ties back to an external link. We used RJMetrics to extract the top-level domain of those links for the pins in our sample. What we found was a pretty tremendous long-tail effect. In our sample of about a million pins, over 100,000 distinct source domains existed. The twenty most prominent are shown below by percent of pins.

The most popular domain was Etsy.com, which powered just over 3% of pins. Close behind was google.com, although almost all Google links point to Google Image Search, which is technically misattributed content from other 3rd party domains. Flickr (2.5%), Tumblr (1.1%), and weheartit.com (1.0%) round out the top 5, after which no domain represents more 1% of pins.

Virality: Re-Pins and Tools

We were able to break out the population of pins based on how those pins were posted to Pinterest. We were expecting a high percentage from pinmarklet, a browser bookmarklet that allows users to pin content from any website with one click. However, what we found was astonishing.

Remarkably, over 80% of pins are re-pins. This is evidence of the impressive level of virality at work in the Pinterest community. Pinterest is truly an ecosystem of sharing. To contrast, a study done by Hubspot at a similar point in Twitter’s history showed that only about 1.4% of tweets were retweets.

User Engagement: Cohort Analysis

Cohort Analysis is a powerful tool that allows us to study different groups of users at identical points in time in their lifecycles, regardless of when they actually joined the site. It’s a great way of getting an “apples to apples” look at newer vs. older users to see how their engagement stacks up.

In the chart below, each line represents a cohort and each cohort is a group of customers who made their first pin in a specific month. For example, the June 2011 cohort consists of users who made their first pin in June 2011. The line itself shows the “average cumulative pins made per cohort member.” So, the “Month 1” data point for the June 2011 cohort shows us how many items were pinned in June 2011 by users who joined in June 2011. The “Month 2” data point on that same line shows us how many pins had been made by the average user in that cohort by the end of July 2011, and so on.

For most companies, even highly successful ones, cohort charts like these show lines that steadily decay toward a more horizontal slope over time. This happens because there is some natural attrition rate with which users simply stop using the site, causing the incremental engagement of the average user to drop off.

That is definitely not the case with Pinterest.

These lines show little to no decay whatsoever. Their slopes remain consistent, indicating a net attrition rate of close to 0%. This either means that no one who starts using Pinterest ever stops or– more likely– that users who continue to use Pinterest become so much more engaged over time that their activities fully make up for those of any users who leave.

To explore which of these two scenarios is playing out, we changed a few options in RJMetrics and ran the cohort analysis below:

This weekly cohort analysis shows the percentage of distinct users from some recent cohorts who come back to pin again in each of the first 8 weeks of their life cycle. As you can see, between 40% and 60% of users are still actively pinning even as far out as week 8. This may seem like a steep drop-off, but for a consumer internet business it’s exceptionally good.

To provide some context, I want to compare this data to a similar analysis I conducted on Twitter in 2009. Twitter was at a similar point in its life cycle (growing tremendously and about a year into its existence). See the chart below:

Twitter’s decay rate was twice that of Pinterest, with user activity (measured by tweets) rapidly plummeting to around 20% before stabilizing.

 

Growing Pains: Quality Decay

With every fast-growing consumer startup I’ve profiled, an increase in media coverage inevitably corresponds to a huge spike in the number of registered users and a drop-off in the quality of the average user (as defined by their level of engagement and likelihood to remain active). Pinterest is no exception.

As shown above, the average new user who has joined Pinterest in the past few months is using the site substantially less than their counterparts from months in the past. I speculate that this is caused by flocks of curious onlookers who are outside of Pinterest’s core audience registering accounts and failing to get engaged. In the long-term, this could potentially represent a challenge to the company maintaining the remarkable engagement metrics we’ve seen so far.

Conclusion

Pinterest demonstrates some of the strongest user engagement, retention, and virality metrics I have ever seen in an online business. The company has found tremendous success among its core demographic, and the potential reach of its appeal will be tested in the coming months as attention from broader audiences continues to increase.

If the company’s performance to date is any indication, however, it will surely be a start-up to watch in 2012 and beyond.

What are the Odds? Debunking the 09/09/09 Babies

[Stay up to date with the latest content: follow us on Twitter @RJMetrics]

Last week, news sites across the web were flooded with stories about “lucky babies” who were born at 9:09AM on 9/9/09. Even more amazing was this story, which told of a baby born at that very minute with a birthweight of 9 pounds, 9 ounces.

The story sounded familiar, so I poked around and found this story from last summer. It tells of two babies, each born at 8:08AM on 8/8/08 and weighing 8 pounds, 8 ounces.

This kind of news story always makes me suspicious. It seems like the chance of this happening would be astronomical, especially two years in a row. Rather than ponder the odds, I decided to calculate them.

The 9/9/09 Baby

The two independent characteristics of this phenomenon are its birthtime and its weight. Let’s explore each individually.

Probability of Being Born at 9:09AM on 9/9/2009

The easy (and highly flawed) way to do this would be to simply divide one by the number of minutes in a year to get a crude probability of being born on any given minute. However, the individual components of a birth time (month, day, hour, minute) are not uniform. Indeed, each is governed by its own probability distribution, which we explore below and use to calculate a more accurate probability.

Month: The CDC’s National Vital Statistics System contained this recent report, which breaks out US births by month of the year. It shows the distribution below with 8.8% of births taking place in September:

Day: While the individual days of a month aren’t heavily studied, I did find an interesting probability distribution by day of the week in the same report:

9/9/09 was a Wednesday, which typically contains 16.1% of the births in a given week. September contains 30 days, or 4.3 weeks. If we assume an equal probability that the baby will be born in any week of the month, the probability of a birth on a specific Wednesday of a 30-day month like September is 3.7%.

Time: Births are surprisingly unevenly distributed across the hours of the day, as shown in this report, which reveals the probability distribution below:

The probability of a birth during the 9AM hour is reflected here as 3.5%, and if we assume a uniform distribution across the minutes of an hour (i.e. it’s equally likely to be born at any minute in a given hour), this places the probability of being born at 9:09AM on any given day at 0.06%.

Conclusion: Based on these numbers, the probability of a baby being born on 9/9/09 at 9:09AM is 0.0002%. (I should note that this assumes that each of these time attributes is independent of the others.)

If we estimate 4.3 million births this year (in line with recent years), this probability tells us that the expected number of babies born at 9:09 AM on 9/9/2009 is… 9. OK, I’ll admit it: that’s a little creepy.

But, that number is before we consider that amazing birthweight of 9 pounds, 9 ounces.

Probability of a 9 Pound, 9 Ounce Birthweight

Various sources, including this report from the International Journal of Epidemiology, confirm that birthweights conform to a Gaussian distribution, as shown below:

Distribution of Birth Weights

Another study provided the characteristics of this distribution: a mean birthweight of 3,369 grams and a standard deviation 567 of grams. Among other things, this tells us that the 9 pound, 9 ounce (4,337 gram) baby was, well, a chubby one (to the tune of 1.7 standard deviations above the mean).

The metric weights that a doctor could convert to ounces to get the 9 pound, 9 ounce measurement were 4,323 to 4,349 grams, or 1.68 to 1.73 standard deviations from the mean.

Using a standard z-table, we easily determined that the probability of a 9 pound, 9 ounce birthweight is quite low: 0.43%.

End Result: Probability of a 9/9/09 9:09 birth weighing 9 Pounds, 9 Ounces

It’s fair to assume that a baby’s birthweight is independent of its birthdate, so we can simply multiply the probabilities from the past two sections to determine the probability that any given 2009 baby was born on 9/9/09 at 9:09AM weighing 9 pounds, 9 ounces: 0.0000008%.

With our estimate of 4.3 million births in 2009, that means the expected number of births meeting those characteristics was 0.035, which implies that there was a 3.5% chance of this baby being born. Another way of arriving at the same result is to calculate the probability that no babies with these birth statistics would be born and look at what’s left (100%-((100% – 0.0000008%)^4,300,000) = 3.5%).

Regardless of how you slice it, there was a 3.5% (or about 1 in 28) chance of a baby being born on 9/9/09 at 9:09AM weighing 9 pounds, 9 ounces.

You don’t need a business intelligence dashboard to tell you that this was a long shot. However, when we factor in the babies from last year, things start to get even more outlandish.

The 8/8/08 Babies

Using the same data and the steps explained above, we can easily identify the probability of any given baby born in 2008 having the characteristics below:

Probability of being born on 8/8/08 at 8:08AM: 0.0002%

Probability of being born at 8 pounds, 8 ounces: 10.6%

Overall probability: 0.00002%

Note that the overall probability here is more than twice as high as for the 9/9/09 baby, mainly because 8 pounds, 8 ounces is a significantly more common birthweight.

Even still, however, with 4.3 million expected 2008 births, the chance of a baby being born in the US on 8/8/08 at 8:08AM weighing 8 pounds, 8 ounces was a mere 10%. The chance of two such babies being born independently, therefore, was a only 1%.

Again, this is a very low probability but certainly not astronomical. What starts to get outlandish is when you consider the probability that all three of these babies would be born. As three independent events, the chance that these three births would take place as they did was approximately 0.04%, or about 1 in 2,500.

Conclusions and Complications

Given the large amount of press attention showered upon these babies and their doctors, there appears to exist a disturbing incentive system for the birthing of babies with numerologically noteworthy birth statistics. (A quick Google News search turned up 290 stories containing the names of the 9/9/09 baby’s parents.)

Given the low likelihood of any such baby being born (and the extremely low likelihood of all three being born independently), two alarming scenarios seem possible (this is, of course, speculation on my part):

  • the involved medical professionals tweaked numbers to yield more noteworthy statistics
  • an unusually large number of deliveries were “scheduled” for these dates to increase the chances of birthing a child with these statistics

Further complicating my math is the popularity of scheduled Cesarian births which allow parents and doctors to choose a specific day and time to begin a delivery process (within a range of options that are equally safe for the mother and child, of course).

While it may seem perverse, it’s entirely possible that such deliveries are being scheduled to increase the chances of outcomes like these. This could heavily impact the probability calculations above and might provide an explanation for the otherwise unfathomable outcomes we’ve seen.

What really sealed the deal for me was this story, which talks about an 8/8/08 baby and casually mentions that he has a younger sibling who was born on 4/5/06. What are the chances that any given set of parents has two children with these two birthdates? I’ll tell you: it’s 0.0009%. (Both were born by Cesarian.)

I’ll leave you to draw your own conclusions, but the next time I read a story about a baby with amazing birth statistics, I’ll chalk it up to more than just “good luck.” Something tells me I’ll have plenty of chances to do so next year– on October 10th.

[Stay up to date with the latest content: follow us on Twitter @RJMetrics]

How to Get Twitter Followers: The Definitive Guide

This article provides an in-depth look at the major tricks and techniques for gaining Twitter followers. These include organic (i.e. legitimate) growth, paid services, and shortcut “tricks” used by the most-followed nobodies on Twitter.

We provide detailed, data-driven conclusions about the effectiveness of each technique and its real cost to the end user. The impact of follower acquisition campaigns are heavily interconnected (as shown in the chart below), but we make an effort isolate the effects of each in a fair way.

A more detailed explanation of this chart is provided in the “Preparation” section below

If you enjoy this article, please be sure to follow us @RJMetrics to stay informed about future updates and our powerful business intelligence dashboard software.

Why Should We Care?

I’ll admit it: I’ve done my fair share of Twitter-bashing. Spending a few years in late-stage venture capital has made me pretty closed-minded about any company that raises mountains of cash but makes less money than a twelve year-old with a paper route.

For better or worse, however, Twitter followers have become a proxy for influence in the digital world. Unfortunately, the big problem with relying on this particular proxy for popularity is that it can be faked. Easily.

Even worse, placing emphasis on this particular metric encourages fakers. Yes, the guy with 30,000 followers might have accumulated them deceptively, meaning there is a risk that he is secretly a huge loser. But, the undeniable loser is that guy with only 30 followers. After all, nobody fakes unpopularity!

So, just how easy is it to rack up a huge base of followers? I decided to identify and test out every method I could find for accumulating followers quickly. My end goal was to accumulate a master list of follower acquisition methods, their effectiveness, and the time and/or monetary costs involved.

Types of Followers

Before we begin, I want to define two important terms that I will use frequently: FreeFollowers (users who follow you without you following them back) and ReFollowers (users who are only follwing you because you followed them).

As you might be able to guess (and as we will quantify later) ReFollowers are much easier to acquire than FreeFollowers because they’re getting some reciprocal value (another follower) out of the relationship regardless of your tweet quality. FreeFollowers, however, actually have to want to follow you because of who you are or what you say.

Whether or not you choose to take the easy road and go after ReFollowers should depend on your goals. If all you care about is a large number of followers and you’re less concerned with follower loyalty or the quality of your Twitter experience, ReFollowers may be your answer. However, if you do care about these things, I would avoid mass ReFollower acquisition for two reasons:

  • The key functionality of Twitter will be impaired because your stream of updates will be extremely cluttered, making it difficult to actually “follow” anybody. You’ll never realistically be able to discover new and interesting users because of the overwhelming amount of disjointed information in your update stream.
  • Many of your followers will also be ReFollow junkies. This means that they will suffer from the same cluttered update stream as you, diminishing your actual “reach” (the average number of followers who actually read any given Tweet you send).

As a general rule, I like to assess the true popularity of a Twitter user by subtracting the number of people they follow from the number of people following them. If you look at the users with over 25k followers, you’ll find that many of them have a negative or very low value when you do this calculation. These people are using ReFollow tricks, the easiest method for artificially inflating their numbers (explained later).

Obviously, these users are gaming the system and provide more noise than signal in the Twitter universe. If you find yourself becoming one, make sure you’re OK with the implications.

Preparation

It’s time to start gaining followers! To give us a clean slate and avoid the risk of losing my account if things get too shady, I created a completely new Twitter account for this experiment. It’s owned by my alter ego, R.J. Moore.

Before we start, one disclaimer: some of these techniques might get our account banned from Twitter if we use them irresponsibly. The following reasons for a banned account have allegedly been provided by Twitter support. Keep them in mind as you build your follower growth plan:

  • You’ve followed a large number of people in a short amount of time
  • There is a small number of followers compared to number of people you’re following
  • The updates consist mainly of links and not personal updates
  • A large number of users blocking the profile and writing in with spam complaints

The following steps are designed to decrease our chances of getting banned and increase our “follower conversion rate,” which is simply the percentage chance that someone who sees our username ends up following us. We do this first so that our later attempts at gaining followers all get a fair shake.

Step 1: The Kevin Rose Basics

I like Kevin Rose, and not just because he’s a fan of my previous work. If you search Google for tips on how to gain followers, his guest post on TechCrunch will probably be one of the top results.

The post lists a lot of good (however obvious) ways to gain followers: fill out your bio, tweet about stuff people care about, self-promote your twitter page, stay on top of your stats, etc. I agree with all of this logic, so we’ll take some time to upload a photo and build a custom page layout for our new account.

Putting time into these basic techniques helps demonstrate that we’re not a spam account, which has a drastic impact on someone’s likelihood of following us.

Cost: 30 minutes

Step 2: Create a Content Base Coat

We aren’t going to get FreeFollowers unless they care about our tweets. And they can’t care about what we tweet if we haven’t tweeted anything. In order to start things off right, I sent out a half-dozen tweets that I hoped would be of interest to our target follower base. They were timely and relevant, but more importantly, they suggested that more interesting tweets would follow.

Cost: 15 minutes

Step 3: Create a “Following” Base Coat

Next, we need to follow some people. I used Twitter’s “Suggested Users” feature to find a handful of interesting candidates.

Following a small number of people (20 to 40) when we’re getting started shows that we’re engaged with the service and less likely to fade away like many new users do. This makes us a more viable “follow” in the eyes of a more selective user.

Total Cost: 10 minutes

Step 4: Create a “Follower” Base Coat

Next, we’ll go out and get our first followers. The logic is simple: the likelihood of someone following us increases with the number of people already following us:

  • With less than a “base coat” of followers, we have no clout and this hurts our conversion rate.
  • As the base coat is built, we receive a conversion rate bump based on the increased confidence that we are not a spam account.
  • For the next several thousand followers, there is a small marginal benefit of a new follower.
  • After we have reached a “large number” of followers (10,000+), we are perceived as a “celebrity” by the average user and our conversion rate experiences another steep increase period due to our perceived fame.

This concept is demonstrated in the utterly imprecise and unscientific chart below:

Note the logarithmic scale on the x-axis. This means that our first followers provide the greatest marginal benefit. As such, we should pick up as many followers as we can from our friends and existing contacts before going after strangers.

I put in my Gmail account information and detected which of my existing contacts were already on Twitter. I followed them all and most of them followed me back, providing a base coat of a few dozen legitimate followers.

Total Cost: 30 minutes

Execution

Now that our account is optimized for converting potential followers, it’s time to actually attract them.

Method 1: Topical Tweets

We’ll start with a very basic technique: saying things of interest to a large population of people. The goal here is to attract FreeFollowers (followers who we are not following us back).

Sites like hashtags.org list trending topics on Twitter (as does Twitter itself). These topics are typically identified by hashtags (words preceded by a “#” sign). Messages that include the most popular hashtags are both heavily tweeted and heavily searched.

We sent out 100 tweets, each with a statement related to one of the top few hundred hashtags (and including that hashtag). Our goal was to be seen by people who weren’t following us and convert them into followers.

It’s important that we tried out this method first, because it carries the risk of annoying our existing follower base. Sending 100 Tweets inside of two hours is going to annoyingly dominate our existing followers’ Twitter update streams, and this increases the risk that they will unfollow us.

We sent out 100 tweets using about 180 minutes of manpower. (Thanks to our interns Cheryl and Mario for helping with this!) These aren’t necessarily easy tweets to send since each requires some kind of clever message and an understanding of what the hashtagmeans. In the end, we added 15 followers from this initiative.

Cost Per FreeFollower: 12 minutes

Method 2: Direct Interaction

This is a similar technique, only instead of targeting the general public we target prospective followers directly. We search the same popular topics and respond to those users who are making comments. Our goal is to have them to follow us after the interaction (even though we haven’t followed them). Here again, we are targeting FreeFollowers.

Note that tweets starting with another user’s name don’t appear in your followers’ update streams. This means that you can respond to hundreds of tweets in a short time without annoying your existing followers.

These messages were easier to write than the previous ones, since they are mainly gut reactions to other people’s comments. Our team was able to send out 100 tweets in about 120 minutes, and we acquired 18 followers in the process. This was a hands-down better technique than Method 1 since it didn’t annoy our existing base and had a meaningfully lower average cost.

Cost Per FreeFollower: 7 minutes

Method 3: Celebrity Mentions

This is a bit of a high-risk, high-reward technique. If someone on Twitter with a large number of followers were to mention our username, it’s quite likely that some subset of their followers would check us out, and hopefully some subset of those would convert into followers.

I used the Twitter directory service WeFollow to find some good targets. I stayed away from the behemoths (Ashton Kutcher, Oprah, etc) and instead opted for any users with 25-100k followers.

I was able to do 50 tweets in about an hour. Here are the responses I got:

Carson Daly (32k followers, I complimented his terrible TV show)

Cuban Singer/Songwriter Jon Secada (96k followers, I asked what’s next)

Publisher Charles Yao (55k followers, asked him about Digg)

Startup Advisor Martin Zwilling (69k followers, asked him about Digg)

I ended up adding 6 followers as a result of these efforts, although I’ve got to imagine the volatility of a method like this is pretty high. If you had a reliable method for engaging celebrities, it could really pay off (although it’s probably not too scalable).

Cost Per FreeFollower: 10 minutes

Method 4: Purchase Users via Display Advertising (FeaturedUsers.com)

FeaturedUsers.com is the advertising engine used by a number of Twitter-related websites. You simply purchase inventory from them and an ad for your Twitter profile shows up on sites in their network. The ads pull your bio directly from Twitter and can’t be otherwise customized. They look like this:

The CPM (cost per 1,000 impressions) was a flat $10. I laid down a cool Hamilton so you don’t have to. After using up about 300 of my impressions, FeaturedUsers informed me that my clickthrough rate of 0.617% was below their network average of 0.837%. They offered the following tips for improving it:

  • High quality headshots tend to perform better than brand logos.
  • Including the phrase “If you follow me, then I will follow you” in your bio seems to help. But only include that phrase if you mean it.
  • Original and odd Twitter bios tend to perform better than a bio that simply lists what you do or have done.
  • Women’s profiles tend to fair better than men’s profiles in CTRs.
  • Have a look at the top CTR performers and see what qualities their Twitter profile has that yours might be lacking.

The second bullet point is asking a lot (making you buy ReFollowers instead of FreeFollowers), but otherwise the information was helpful. I checked out the top performers and made a few modifications to help boost my clickthrough rate:

My conversion rate immediately doubled to around 1.2% and I was listed on the “top converting ads” page for the remainder of my campaign. When my inventory was all used up, I had generated 12 clicks, and the site provided these charts:

The 12 clicks led to 10 new followers, yielding a $1.00 cost per FreeFollower.

Cost Per FreeFollower: $1.00

Method 5: Purchase Users via Paid Tweets (Magpie)

Magpie is an advertising system for Twitter where you basically pay people to tweet your ad to their followers. You create a 130-word ad, agree to a maximum CPM and then it’s bombs away.

The system is super-easy, but two things struck me as odd. First of all, their CPM system seems to be bid-based in that you name a maximum CPM that you’re willing to pay. The traffic estimates imply that the minimum CPM you’ll ever pay is $3, but you’re forced to choose a maximum anywhere from $10 to $30 depending on your keywords.

This seems like a massive spread for ads that don’t have a “placement” factor like, say, Google Ads. These ads either run or they don’t, so what does the higher bidder win? I assume it’s to determine who gets the tweet if there is a shortage of inventory, but I’ve got to imagine there’s no shortage. After all, unlike in most display advertising, generating more inventory doesn’t necessarily require more followers. It just requires that you send more tweets.

The second thing that struck me was how high these maximum CPMs are. My $10 bid (the lowest it would allow) means that I’m paying up to $10 for every 1,000 followers of the person who tweets my ad. Since not every follower necessarily sees every tweet, the real “cost per impression” is probably much higher than $10, particularly if the tweeter has a low-quality follower base.

If you disagree, consider this: a $10 CPM implies that Ashton Kutcher, who has 2.7 Million followers and averages 16 tweets per day, creates $157 Million of prospective advertising inventory a year. Right.

Anyway, I bit the bullet and created a tweet to be sent out by people involved in “business” and “entrepreneurship.” I bid a $10 CPM, put the minimum $20 into my account and crossed my fingers. Magpie made me wait 3 days after paying them to approve and run my ad, which was frustrating (apparently they don’t work on the weekends). After the ad was approved, my entire budget was consumed on this single tweet:

The poster had 2,628 followers at the time of the tweet (which was 4PM on a Tuesday) and it cost me $19.09, meaning I was charged a $7.26 CPM. In the remainder of the day (enough time for a tweet to fade into the oblivion), the link received 15 clicks (a 0.57% click-through rate), but yielded only two followers. This means I paid a whopping $14.55 per FreeFollower.

In Magpie’s defense, I’m sure that responses vary heavily from tweet to tweet, and I am working with an admittedly small sample size. However, it’s undeniable that the clickthrough rate is quite low and the click-to-conversion rate is even lower.

Cost Per FreeFollower: $14.55

Method 6: Purchase Users via ReFollow Services

There are a number of businesses that will outright sell you followers. If you Google “get Twitter followers,” look at the ads, and find the ones that seem the most shady, those are the services I’m talking about here.

Below I list a representative subset of these businesses and the costs of their baseline packages:

Few of these sites actually tell you how they do it, but “Get More Twitter Followers” is kind enough to offer a pretty transparent explanation of what’s going on:

“This service works by adding quality targeted followers over time by following people first, then waiting for a reciprocal follow back, then after a grace period, purging the non returning followers and repeating the process until we reach your purchased follower amount.”

In other words, these services sell you ReFollowersexclusively. As explained above, we know the pitfalls of a ReFollow-based strategy. With this new data, however, we are now able to quantify the cost of a ReFollower versus the cost of a FreeFollower (as dictated by the marketplace).

Our lowest effective FreeFollower price was about $1.00, while we can buy ReFollowers here for about 10 cents apiece. This suggests that the “value” of a FreeFollower is around 10X that of a ReFollower, which (if you think this is an efficient market and you tie value directly to reach) you could take to mean that a FreeFollower is 10X more likely to read any given Tweet you send out.

Cost Per ReFollower: $0.10

Method 7: Hosted ReFollow Services

Despite my personal aversion to ReFollow techniques, I think it’s worth exploring exactly how complex this process is and what kind of time it takes to do-it-yourself instead of purchasing something from a provider in the previous method.

Generally, here’s what we want to do:

  • Identify users (typically ones who tweet certain keywords of relevance to us) and follow them.
  • Give them a grace period of two days or so to follow us back, and if they haven’t done so, unfollow them. This prevents us from following dead weight.

I’m going to add an extra step here that is a bit more devious. After a few days of creating a new reciprocal relationship, I’m going to unfollow anyone I’m following (even the people who are following me), yielding new ”artificial” FreeFollowers but likely increasing the risk of losing them altogether. This is generally a huge faux pas in the Twitterverse, but I cared enough about getting the data to risk the bad Karma (especially with a dummy account).

I call these artificial FreeFollowers because of how they were acquired. The fact that they participated in a reciprocal ReFollow means they are likely to have a noisy update stream and therefore have a poor likelihood of reading our tweets. This makes them less valuable than a more legitimately-acquired FreeFollower. If you’re only interested in perception, however, this could be a great move because artifical FreeFollowers can’t be distinguished from true FreeFollowers.

Doing this manually through the Twitter interface would be pretty laborious, but thankfully the Twitter API has led to the creation of tools that help do it for you. We will look at two: Twollow and FlashTweet.

Twollow: Twollowallows for massive, scheduled, automated follows and unfollows. The service has a 7 day trial that is pretty watered-down. I sprung for their basic $6/month subscription to give it a real test.

Twollow does exactly what we wanted. It allowed us to follow up to 75 new keyword-specific people per “cycle” (every few hours). Then, it auto-unfollowed any of them who hadn’t followed us back after three days. This basically allowed us to sit back and accumulate ReFollowers with no effort whatsoever.

Twollow ended up following 203 users in its first 24 hours. I then waited another day to allow people to follow me back and ended up with 48 ReFollowers (a whopping 24% conversion rate). This after having spent only $6 and virtually no time. Assuming I could keep this up for my 30-day subscription, I would end up with 1,400 followers for just a $6 investment.

Cost Per ReFollower: $0.004

Remember, however, that these are ReFollowers. Even worse, while this process is going on we’ll be following a massive number of people. This isn’t going to look very good to an informed eye.

Now, for the devious step. I unfollowed everybody it had added (I used a free tool called Mutuality to do a bulk removal of everyone I was following) and waited another few days to see how many would drop me. I tracked this using another free tool called TwUnfollow that e-mails you every time someone unfollows you.

22 of my refollowers did, leaving me with a net gain of 26 artificial FreeFollowers. Extrapolating this suggests that we could gain up to 780 artificial FreeFollowers during our $6 subscription.

Cost per (artificial) FreeFollower: $0.008

In terms of posting numbers, this technique is untouchable in ease and cost. If you don’t care about the loyalty or reach of your base, Twollow is the way to go.

FlashTweet: This service is similar to Twollow, except that it’s free and lacks Twollow’s automation features. Using FlashTweet, you can identify and follow up to 100 users at a time based on keywords or relationships (i.e. followers of followers), but these adds are done in bulk at the time of request and can not be scheduled. You can also unfollow anyone who is not following you in bulk with a similar process.

FlashTweet carries a slightly higher banning risk than Twollow because its lack of automation lends itself to adding larger batches less frequently. However, if you have the time to log in on a regular basis and keep track of who to unfollow and when, you can absolutely generate similar benefits to Twollow.

It’s reasonable to assume you can achieve a similar yield of around 50 ReFollowers or 25 artificial FreeFollowers per day using FlashTweet. However, this requires an estimated 30 minutes of labor per day (which may decline with scale when you can add larger batches with less risk).

Cost per ReFollower: 36 seconds
Cost per (artificial) FreeFollower: 72 seconds

Method 8: Pyramid Schemes

If Bernie Madoff was on Twitter, he’d use tweet penguin. It’s a Ponzi scheme, plain and simple, as described below:

I’ll avoid a discussion of Ponzi/Pyramid Schemes and their downfalls. Suffice to say, this technique gained me just one follower (and forced me to follow five people I didn’t care about).

Method 9: Following Communities

There are a number of these, but they’re all just variations on a theme. I’ll focus on two:

TweepMe: Pay these guys $12.95 and they’ll let you into a ReFollower community of about 5,000 people. Once you join, you and the other members follow each other automatically. (This is spread out over time to prevent account-banning risk.)

I assume this works, making it another cheap source of ReFollowers. However, these followers must be of terrible quality. It’s nothing but ReFollow junkies who are so interested in followers that they will pay to acquire them. Adding these particular followers will do little for your actual reach.

Cost per ReFollower: $0.002

FastFollowers: This is a slightly more interesting system, although for the most part it boils down to more ReFollow swaps. The twist here is that everything is based on a “credit” system:

Again, there is most likely a quality issue here because everyone participating is also gaming the system.

You can buy credits for 8.5 cents apiece, and the typical follow costs you about 3 credits. Alternately, you can follow people to earn credits, which you can then use to gain followers. This takes some time though (I ballpark it as 1 minute of effort per ReFollower).

Cost per FreeFollower: 3 cents
Cost per ReFollower: 1 minute

Method 10: Buy a Guide of “Hidden Secrets”

I wanted to mention these because they show up a lot when you Google “Twitter Followers.” There are several dozen sites out there that will let you pay them $30-100 for a top-secret guide on how to gain followers. These are often wrapped in with the “get rich quick” language promising to help you “make money on Twitter!”

I promise that you’ve already learned more in this blog post than you would learn from any such guide. These “secret methods” are always based on the follow/unfollow trick, and are often coupled with some tricks to prevent churn and keep your following/followed ratio in check.

Cost: Your dignity

Summary/Conclusions

Ultimately, your best course of action boils down to two factors:

  • Whether you mind “following” a large number of users to build up your own following
  • The dollar value you place on your time

With these factors in mind, you can identify the most cost-effective method for yourself using the summary table below:

FreeFollower Cost ReFollower Cost
Topical Tweets 12 minutes N/A
Direct Interaction 7 minutes N/A
Celebrity Mentions 10 minutes N/A
Magpie $14.55 N/A
FeaturedUsers.com $1.00 N/A
ReFollow Services N/A $0.10
Twollow $0.008* $0.004
FlashTweet 72 seconds* 36 seconds
TweepMe N/A $0.002
FastFollowers $0.03 1 minute
*compromised quality

I hate to state the obvious, but if you’re looking to establish a legitimate follower base, the best thing you can do is accumulate legitimate followers. This means avoiding mass ReFollowers (and related tricks), and choosing to interact with real people who have similarly pure motives. In other words, use Twitter the way it was meant to be used.

Good luck and happy tweeting! To hear about updates and future articles, please keep an eye on our blog and follow us on Twitter @RJMetrics.

DoS Attacks Trend Toward Politics

The internet is not something that you just dump something on. It’s not a big truck. It’s a series of tubes. Those tubes can be filled, and if they are overfilled, bad things happen. This is the basic idea behind denial-of-service attacks: to render a computer resource unavailable to its intended users, often by simply flooding the target system with ping requests. These attacks have become increasingly ubiquitous, largely because they are relatively easy to execute. Below is a brief history of fascinating and noteworthy DoS and DDoS (distributed denial-of-service) attacks. The results reveal a surprising evolutionary trend: where the earliest incidents were generally launched for fun or profit, contemporary DoS attacks have become prominent means of protest and dissent.

  • 2 Nov 1988: the Morris worm, written by Cornell CS grad student Robert Morris, was the very first significant DoS attack. Morris put roughly 5000 machines out of commission for several hours.
  • Mar 1998: Attackers exploited a problem with Windows NT servers, and successfully drove thousands of NT stations, including ones at NASA, MIT, the U.S. Navy, and UC Berkeley, offline. This DoS attack led to the formation of the FBI’s Infrastructure Protection and Computer Intrusion Squad, better known as the Power Rangers.
  • 26 Mar 1999: The Melissa virus, written by David L. Smith, was a mass-mailing macro virus. Once it penetrated a computer, Melissa gained access to Microsoft Outlook and began self-replicating, mailing itself to the infected user’s correspondents. Because of the virus’ rapid rate of replication, many email systems were overwhelmed by the traffic; Melissa ultimately incapacitated the email networks of over three hundred U.S. corporations.
  • Jan 2001: Register.com was targeted and booted offline by a DDoS attack that used DNS servers as reflectors, and forged requests for the MX records of AOL.com. It lasted roughly a week before it could be traced back and disabled.
  • Oct 2002: attackers performed a DNS Backbone DDoS attack on DNS root servers, machines intended to provide service to all Internet users. The attackers succeeded in disrupting service at nine of the thirteen American root servers.
  • Feb 2007: over 9,000 online game servers for games such as Counter-Strike, Halo, and Return to Castle Wolfenstein were attacked by “RUS,” a Russian hacker group. The DDoS attack was executed from over a thousand units located in Russia, Uzbekistan, and Belarus. Terrorists win.
  • Mar 2007: Mininova suffered a massive DDoS attack that completely incapacitated the premier BitTorrent tracker. Though trackers are hardly strangers to DoS attacks, this incident was the first time that one of the world’s largest index sites, which generally have a tremendous traffic capacity, was noticeably hammered by a concerted and intercontinental DDoS effort.
  • 25 Apr 2007: Ethnic Russian Estonians launched a series of DDoS attacks against Estonian businesses and institutions, including the website of Prime Minister Andrus Ansip’s Reform Party. The attacks were set against the backdrop of ethnic riots prompted by the removal of a Soviet war memorial from the center of Tallinn, Estonia. David Emm, senior technical consultant at Moscow-based antivirus software company Kaspersky Lab, told BBC reporters that he believed the most likely culprits were, “younger types who, in other days, would have been writing and spreading viruses.” Dmitri Galushkevich, a twenty-year-old ethnic Russian, was later convicted for his involvement in the attacks.
  • Jan 2008: Members of “Anonymous,” a self-branded collective of unnamed individuals from various internet subcultures, launched a DDoS-based attack on the Church of Scientology in response to alleged acts of intimidation and censorship. On January 20, the group flooded Scientology.org with as much as 220M bps of traffic, succeeding in knocking the site offline. On January 21, thirteen-year-old boys everywhere celebrated their “epic win.”
  • 19 Apr 2008: CNN reported that their news site had been targeted by DoS attacks, resulting in slowed or unavailable service in limited areas of Asia. A CNN article on the subject cited reports by Asian tech sites that Chinese hackers were targeting the news conglomerate in response to their coverage of the unrest in Tibet. Many Chinese bloggers accused CNN and other leading Western news organizations of taking a pro-Tibetan stance when reporting on the region’s civil turmoil.
  • Apr 2009: Malware hunters at Symantec discovered that malicious files embedded in pirated copies of Apple’s iWork 09 software spawned what appears to have been the first Mac OS X botnet, which launched DDoS attacks on an unknown website. Virus Bulletin researchers Mario Ballano Barcena and Afred Pesoli found two variants, OSX.Iservice and OSX.Iservice.B, using different techniques to obtain users’ passwords and assume control of the infected machines.
  • 5 May 2009: In an ironic twist, members of 4chan, the world’s largest and most notorious English-based imageboard, launched a successful DDoS attack… on themselves. Spammers posted images with a link promising free porn, which lead instead to a zip file that contained an auto-executable virus. The attack was allegedly executed by a college student who wanted to take down the site so he could study for finals without being distracted by /b/.
  • May 2009: Millions of Chinese internet users were unable to access the internet because of a massive DDoS attack that knocked a DNS system from one of the country’s registrars offline. Konstantin Sapronov, head of Kaspersky’s Virus Lab in China, commented, “The incident revealed holes in China’s DNS that are ‘very strange’ for such a big country.” The registrar that was attacked hosted the DNS for video streaming site Baofeng; traffic was so high for this site that unanswered DNS requests created an additional traffic jam, essentially multiplying the attack.
  • 15 Jun 2009: Sites belonging to Iranian news agencies, President Mahmoud Ahmadinejad, and Iran’s supreme leader Ayatollah Ali Khamenei were knocked offline when activists protesting the results of the recent Iranian elections used DoS attacks to flood the sites with traffic. In his article “With Unrest in Iran, Cyber-attacks Begin,” Robert McMillan, a reporter with IDG News Service, comments, “This type of attack, known as a denial of service (DoS) attack, has become a standard political protest tool, and has been used by grassroots protesters in several cyber-incidents over the past few years, including cyber events in Estonia in 2007 and Georgia last year.” The activists have used both web-based page refresh tools, including Pagereboot.com, and custom tools promoted via twitter, blogs, and activists abroad.

With a Yo-Ho-Ho and a Tricky Lah-Tee Doo

The past few weeks have seen marked developments in piracy and file sharing. The Swedish Pirate Party, a political organization that strives to legalize file sharing and bolster internet privacy, scored a considerable victory on Sunday, securing at least one (but probably two) of Sweden’s eighteen European Parliament seats. In somewhat related news, my family discovered an incredible site where we could stream Stanley Cup games—valuable in spite of the decidedly-below-HD picture quality. My heart swelled with pride as I watched the Pittsburgh Penguins, my favorite team, skate their way to a game seven in Detroit, and I recalled the words to a pirate chantey from last month’s South Park episode about Somali swashbucklers. Cartman, or Captain Cartman, as he prefers to be called, leads four of his classmates-turned-crewmates to Mogadishu to live the pirate’s life, and when spirits are low, he breaks into song, “We drink and we pillage and we do what we please/ We get all that we want for free…”

Live streaming hockey only buffers my belief that virtually any form of data is available without charge if you know where to look for it. This is increasingly true, as BitTorrent indexes, tracking sites, and clients multiply by the minute. The European Parliamentary elections appear to be a microcosm of a broader shift: public opinion on file sharing and piracy seems to be catching up with evolving transfer technologies. In this post, I’d like to evaluate the schematics and efficacy of BitTorrent, which has become the preferred protocol for file sharing. I’d also like to consider public and interest group response to this topic, and the resultant operational impact on popular BitTorrent trackers, most notably The Pirate Bay.

How BitTorrent Works

Because BitTorrent is simply a file-sharing protocol without intrinsic search capacity for content, it is up to users to find .torrent files. The system relies on external components for file search, employs a moderator system to ensure the integrity of file data, and uses a bartering technique for downloading in order to prevent users from freeriding. In their essay “BitTorrent P2P File-Sharing System: Measurements and Analysis,” Pouwelse, Garbacki, Epema, and Sips highlight the elements that are necessary in a P2P system in order for it to be accepted by the larger community, “First, such a system should have a high availability. Secondly, users should (almost) always receive a good version of the content (no fake files). Thirdly, the system should be able to deal with flashcrowds. Finally, users should obtain a relatively high download speed.”[1]

After a peer has finished downloading the file, it may become a seed by staying online and sharing the file without bartering. The first concern, availability, is highly dependent on the willingness of peers to become seeds, sharing the data and proliferating the torrent swarm. The following table shows the results of Pouwelse, Garbacki, Epema, and Sips’ uptime measurements:

Here, peer uptime is plotted in hours after they have finished downloading, with peers ranked according to decreasing uptime. The longest uptime period is 83.5 days. This plot has been transformed using a log-log function, and shows an almost straight line between peer ten and peer 5,000. The sharp drop after 5,000 indicates that most users disconnect from the swarm shortly after the download has finished—within a few hours. The suddenness of the drop is important to note because the actual download of the data spans several days.

This chart ultimately demonstrates that seeds with a high availability are rare. Only 9,219 of 53,883 peers (17%) have an uptime longer than a single hour after they have finished downloading. Over a period of ten hours, this number decreases to only 1,649 peers (3.1%), and for 100 hours, to a mere 183 peers (0.34%). This data suggests that peers should be given increased incentives to lengthen their uptimes.

Pouwelse, Garbacki, Epema, and Sips also tried to evaluate the integrity of the medium by actively trying to pollute the system and then analyze the efficacy of these attempts. They write, “We created several accounts on different computers from which we tried to insert files that were obviously fake. We failed; the moderators filtered out our fake files.”[2]

The team also addresses their third concern: BitTorrent’s ability to handle flashcrowds, or the system’s reaction to the sudden popularity of a single new file. The figure shows the number of downloads for a single file as a function of time (the Lord of the Rings III movie, 1.87 GB):

The chart shows a gradual decrease over time, devoid of any sudden or unexpected drops or blips. This suggests that the system is able to comfortably handle the sudden popularity of a single popular movie, irrespective of how long, drawn out, and inferior to the original book that movie is.

Culture and Ethos of Piracy

Within P2P systems a tension exists between availability, which is improved when there are no global components, and data integrity, which benefits from centralization. BitTorrent seems to have struck a delicate balance, and is maintained through the decentralized work of a crew of dedicated pirates. In his essay “The Pirate Bay and the ethos of sharing,” Jonas Andersson explains that pirated products necessitate, “…labour which, at first glance, appears entirely unpaid: the late-night tinkering of crackers, encoders, subtitlers, administrators, seeders, leechers.”[3]Time and effort is needed on the part of these parties to facilitate the file-sharing network, but once it is in place, they too can benefit. In this sense, irrespective of incentives, file-sharing is governed by the law of equivalent exchange. File sharing, Andersson poses, “…is highly motivated by personal gratification and notions of comfort and instantaneity.”[4]

There remains, however, a decidedly altruistic element. There is an inherent and natural desire to augment and improve the internet, and the most effective way to do so is to make a personal contribution. Andersson writes, “This can be seen as a benevolent, humble intention which has roots deeply paralleled with the Internet ethos of free information exchange, distributed efficiency and adherence protocols.”[5]Copyright laws have made this culture of sharing problematic, however.

We are currently at a junction referred to by some commentators as ‘the grey commons’: uses of content are labeled either “copyright infringement” or “creative appropriation,” depending on who you consult. Rasmus Fleischer and Palle Torsson, the authors who coined the ‘grey commons’ description, categorize file sharing as a horizontal activity, “The only thing copyright can do is impose a moral differentiation between so-called normal workings and immoral.”[6]Their line of thinking works to blur the fundamental distinction between consumers and producers.

The idea of viewing media as a stream of data rather than divisible, extricable content facilitates the view of the grey zone as a radical blurring of distinctions between cultural consumers and producers. Andersson adds, “When the P2P enthusiasts label the work of fans, aficionados and boffins ‘production’ they are actually referring to what is still a phenomenon of consumption…,” albeit a consumption, “…so thorough, intense, dedicated that it goes into overdrive, becomes explicitly productive.”[7]

Productive consumption of this nature is more actively put into practice in some cultures than others. Japanesemangaka, the artists behind widely popular Japanese language comic books, often give out their personal addresses, encouraging readers to provide feedback that often has a real, substantive impact on the development of themanga. The industry also generally welcomes Dojinshi, or fan-published works that often include established characters and official plots. By allowing for highly participatory consumer involvement, the manga industry is actively blurring the line between production and consumption in an effort to release more accurately targeted products. After all, it is ultimately fans and consumers that determine the success of the product.

The only analogous American example I can think of is the production of Snakes on A Plane, a high-flying thriller that hardly needs an introduction, released by New Line Cinema in August of 2006. The film’s production team incorporated feedback from online users, and added five days of reshooting, garnering a significant amount of internet attention prior to its release. My friends and I were completely on board; we attended a raucous, sold out midnight showing of the film the night of its release. The film’s reshot version included a number of lines and scenarios that were inspired by the internet phenomenon, but one line stands out far above the rest. Here’s the SFW version of Samuel L. Jackson’s timeless quote:

The film’s gross revenue may not have lived up to its internet-generated hype, but Snakes on a Plane imparted a new alternative for consumer-entertainment industry relations, and a pretty cool T-shirt that I picked up at the mall the next day.

Attacks on Piracy

Perturbed by their inability to quell burgeoning BitTorrent clients, the entertainment industry has sought more viable alternatives. In their essay, “A Measurement Study of Attacks on BitTorrent Leechers,” Dhungel, Ross, Schonhorst, and Wu explain, “Given that it is currently difficult, if not impossible, to stop BitTorrent by suing companies, and that suing individual users is both painstaking and unpopular, the only remaining way to stop BitTorrent is via internet attacks.”[8]More specifically, the film and music industries have begun to hire anti-P2P companies to impede sharing of copyright-protected data. Dhungel, Ross, Schonhorst, and Wu analyzed the efficacy and quantity of anti-P2P attacks on BitTorrent systems by developing a crawler that contacts all peers in a given swarm, determines whether the swarm is under attack, and identifies the attacking peers in the swarm. The crawler was put to practice, analyzing torrent files of eight top box-office movies.

The team came across two principal attack types: the fake-block and uncooperative peer attacks. The former is used to prolong the download of the file by peers by pasting their download bandwidths. In the BitTorrent system, each file is divided into pieces, each about 256 KB, and each piece is typically divided into 16 blocks. A leech, when downloading a piece, requests different blocks from diverse peers. The attacker joins the swarm and advertizes that it has a large number of file pieces. Instead of transferring authentic blocks however, it sends fake ones, causing the victims’ eventual hash check to fail, requiring the peer to download the entire affected piece again. This event can occur ad nauseam, consistently stealing bandwidth from the victim peer.

The second variation, the uncooperative peer attack, is a bit more complex. In executing an uncooperative-peer attack, the attacker joins the targeted swarm and establishes TCP connections with a number of peers, but never provides any blocks to its victims. A common variant of this attack is the chatty peer attack, where the attacking peer speaks the BitTorrent protocol with a number of peers, advertising that it has a number of pieces of the file, but then refuses to transfer them to the victims. Every time a request is sent, the attacking peer resets its connection with the victims, and resends the handshake and bitmap messages.

The research team conducted passive measurements by repeatedly downloading a file suspected to be under attack, collecting multiple packet traces from hosts connected to both Ethernet and DSL access networks. The results recorded were for a torrent for “Echoes, Silence, Patience, & Grace,” an album from the Foo Fighters that had just been released, and was suspected to be under attack. This test was conducted on both Azureus and µTorrent, the two most widely used BitTorrent clients. The research team calculated a measure of delay using a delay ratio equal to the difference between the average download time with and without IP-filtering over the average download time with IP filtering. Their results are as follows:

 

Using an internally-developed packet parser, the team was able to attain a deeper understanding of the attack on Azureus and µTorrent clients. Using the parser, Dhungel, Ross, Schonhorst, and Wu were able to distinguish benevolent peers from those that were chatty, fake-block attackers, devoid of TCP connections or BT handshakes, and other peers that didn’t fall into previously listed categories. They found that, unlike on Azureus, fake-block attackers were present on µTorrent, but that on the latter client, benevolent peers made up a higher percentage of the swarm (22%, as opposed to 10% on Azureus).

Dhungel, Ross, Schonhorst, and Wu also actively detected chatty peers and fake-block-attack peers in torrents for top box-office movies using an internally developed crawler that traverses the BitTorrent network gathering IP addresses of peers for a given torrent. Their results are presented below:

Dhungel, Ross, Schonhorst, and Wu’s data suggests that anti-P2P companies can certainly prolong the average download time of immensely popular torrents, but the extent of this prolongation is typically modest and does not exceed 50%. This is hardly a problem for BitTorrent users, who often just leave their client running in the background or overnight. That said, blacklist-based IP filtering appears an insufficient filter mechanism. Dhungel, Ross, Schonhorst, and Wu agree, “To better filter out attackers, it is necessary to design smart online algorithms to identify different types of attackers.”[9]

The Pirate Bay Trial

On 17 April 2009, the Stockholm District Court found four people connected with The Pirate Bay, Sweden’s most infamous BitTorrent tracker, guilty of assisting in a form of copyright infringement. Each was sentenced to a year in jail and ordered to pay $3.7M in compensation and damages. The trial sent a strong message to the file-sharing community that, contrary to popular belief, tracker sites like The Pirate Bay are not immune from liability. Torrent trackers had become (and continue to be) widely accepted as mainstays of internet-based file transfer, so much so that the court’s ruling seemed almost surreal. Andersson explains, “What was so controversial about the raid against The Pirate Bay was—firstly—that it so clearly showed how out of tune the syndicated authorities and anti-file-sharing propagandists were in relation to the wider public, and how established, how everyday and normative, how casual and careless mass-scale file-sharing has become.”[10]

Some argue that the legal significance of the decision is likely limited, as it was decided under Swedish criminal law that is not harmonized across Europe. It’s true that Sweden is a unique case because of its sheer volume of file sharing: the country has the highest penetration of high-speed broadband-fiber connections in Europe. That said, the Pirate Bay Case in not an isolated incident. In September 2008, the Provincial Court of Madrid held that Sharemula.com, a torrent index site, and its administrators had not infringed any laws as the website merely linked to sources of illegal content rather than hosting it themselves.

In contrast, the Longroño (La Rioja) Criminal Court held in April 2009 that the site administrator of infopsp.com, another torrent tracking site, had indirectly profited from copyright infringement via advertising and SMS, and sentenced him to six months imprisonment and a fine. In his article, “The Pirate Bay case: repercussions beyond Sweden?,” Mark Young, an associate in the European intellectual property group, Covington & Burling LLP concludes, “…in the context of the various legislative developments in France, the UK, and in Brussels, regarding the responsibility of ISPs to cooperate with rights holders to prevent copyright infringement, the verdict can be regarded as a significant development in this ongoing battle.”[11]

Trial Aftermath, Pirate Party Sails to Victory

In the aftermath of the trial, many Swedes saw the authorities’ crackdown on The Pirate Bay as caving in to pressure from the American entertainment industry. This led to a timely surge of support for the Pirate Party, which repeatedly verbalized their support for the notorious tracker site, boosting the movement’s membership to 45,000. “‘We tripled our member count in a week,” Pirate Party founder and chairman Rickard Falkvinge said[12]

Prior to the party’s nearly implausible success in the European Parliamentary elections, Falkvinge warned, “‘Politicians tend to look at the internet as a computer game that you can take away from the kids when they’ve been bad,’ Falkvinge said. ‘The time has come to understand information politics. That is why we are growing like a torrent.”’[13]His comments ring more true now than ever, as registered “pirate parties” have sprung up in Austria, Denmark, Finland, Poland and Spain and groups are attempting to register as political parties in the UK and the US.

A detained pirate, when asked by his captor, Alexander the Great, of his intentions, purportedly replied, ““What do thou meanest by seizing the whole earth?; but because I do it with a petty ship, I am called a robber, whilst thou who do it with a great fleet are a selfstyled emperor.” If piracy is truly a matter of scale, the entertainment and recording industries are standing at the helm of the Titanic, and they’ve had it with these monkey-fighting torrents on this Monday to Friday internet. The question persists: what can they really do about it?

The Stanley Cup (Advertising) Finals

Background

I love hockey and like most hockey fans, I’ve been actively following the NHL playoffs, including the Stanley Cup finals that started at the end of May. While watching the finals, I couldn’t help but notice some of the advertisements that played during the intermissions and the three commercial breaks each period. I sat through the standard beer commercials (Bud Light, Miller Lite, Amstel Light to name a few) and the US military commercials (Army, Marines) without paying much attention, but then some Cisco commercials came up. These commercials were a bit of a surprise to me, since I wouldn’t expect a strong overlap of between hockey fans and videoconferencing enthusiasts. I also remembered that Cisco heavily advertises on the NHL.com website.

As I watched more and more of these Cisco commercials, with about one appearing during each commercial break, I became very curious about why Cisco would choose the SC finals to advertise their new products. I decided to look into the demographics of NHL fans to see if Cisco knew something I didn’t. Not surprisingly, they did: the makeup of the NHL fanbase.

Armed with some good starting data and a little curiosity, I decided to go a step further and compare the likely target audiences of the most prevalent ads during the SC Finals to the NHL fanbase and determine how good of a “fit” they each really were. What follows is my search for the “Stanley Cup (Advertisement) Champion.”

The Facts

NHL Demographics: It turns out that the average household income of an NHL fan is almost $89,000. Additionally NHL fans are younger and more tech savvy than the fans of the other major US sports leagues (MLB, NBA, NFL). The full description of NHL fan demographics can be found, courtesy of Experian Consumer Research.

2009 Stanley Cup Viewer Numbers: I’m only looking at the numbers for the games on NBC, although some can also see the games on Versus. Game 1: 4.36 mil, Game 2: 5.33 mil, Game 5: 4.28 mil, Game 6: 5.45 mil. This results in an average of about 4.86 million viewers per game.

The Competitors (Advertisers)

I chose to examine advertisers who have had their commercials appear frequently and consistently in games 1 through 5.

Light Beers

Three main light beers have been advertised so far during the finals: Miller Lite (the not-so-funny “Taste Protection” commercials with the mafia), Bud Light (and their “Party Boat”), and Amstel Light (with their use of the incredibly catchy intro of “Chelsea Dagger” that is still stuck in my head).

The general belief is that men tend to drink beer more than women and that this margin decreases slightly when it comes to light beer. This belief is actually based in fact. Not only that but most of the consumers of light beer are younger, being between 21 and 44 years old. Therefore, NHL fans are a particularly good match. They’re younger than the audiences of the other sports and probably more likely to drink light beer.

Potential Customers Reached: Based on the data from the above survey combined with the NHL demographics information and 4.66 million views (average views of Games 1, 2, 5, 6), the light beer companies are reaching 851,896 potential customers.

Verizon Wireless

I’m sure pretty much every NHL fan is aware of Verizon Wireless’s commercials, since they’ve been running them for around a year. They use the tagline “Hockey fans aren’t like other fans” and advertise Verizon’s VCAST option to watch the game via a person’s cellphone. Ignoring the fact that the commercials are specifically tailored to NHL fans, the product itself is actually a pretty good fit with the fanbase. This is based on the fact that NHL fans are more “tech savvy” than the fans of other sports and would be more likely to try out this technology.

Potential Customers Reached: Based on this Harris survey which states that 89% of people in the United States have cell phones, Verizon Wireless is reaching at least 4.3 million potential customers. However, this number is most likely higher since NHL fans are younger and more tech savvy (and therefore more likely to use cellphones).

Military

For this one, I’m going to focus on the US Army in particular, though the information is applicable to the other military service branches as well. First, some facts: the average enlistment age for the US Army is 21.3 years old, most soldiers are male, and the average term of service for a new recruit is about 4 years (data from here).

This information actually matches up surprisingly well with NHL fan demographics. Because NHL fans are relatively young, the Army ads are tapping into a concentrated base of recruitment prospects. However, one dilemma is the fit for household income of new recruits is not quite ideal for the NHL fanbase. According to a study conducted by Tim Kane (p.5 for the chart), the majority of new recruits have a household income between $30,000 and $50,000. However, the household income of the average NHL fan is double that range, being estimated at about $89,000.

Potential Customers Reached: Since about 52% of the NHL fanbase meets the age qualification for military service, the Army/Military is reaching 2.52 million potential recruits. However, based on the data from the DoD, which states that there are 3 million people in the US military, and US Census data, only about 2.88% of the qualified population is actually in the military. Applied to the above number of potential recruits, you get a more realistic number of about 72,700.

Those “Visit Canada” Commercials

When I first saw these commercials, I was almost insulted, thinking “Just because I’m a hockey fan and hockey is really popular in Canada doesn’t mean that I want to visit the county.” Regardless what you take away from them, however, what these commercials do tell you is that Canada is trying to increase tourism to its country and is doing so through television ads.
Based on the results of this Canadian study, most visitors to Canada are older, with 59.7% of tourists to the country being older than 45. 77.4% are likely to use the internet to at least plan and investigate their trip, which implies some affinity towards technology and the average household income is $89,289. Additionally, 52.7% of tourists to Canada are male.

Although the age group does not fit particularly well with the estimated age breakdown of NHL fans, the technological affinity, gender, and household income match up very well with the profile of the NHL fanbase. However, it is also possible that age plays an important part in whether one does or does not take a vacation to Canada. If that is true, then the commercials are definitely reaching a less-than-ideal audience.

Potential Customers Reached: Using the numbers from the above survey, and the breakdown of age groups provided from the NHL fan demographics study, Canada is reaching 827,392 potential tourists.

Cisco

As shocked as I initially was to see Cisco advertising during the SC finals, the commercials themselves actually pretty amusing and clever. However, the real concern is who exactly Cisco is attempting to target with these advertisements and whether or not Cisco is reaching their desired audience.

The commercials themselves are advertising some new products that Cisco created to allow people to work together over long distances, allowing companies to cut down on their traveling expenses. The target audience for these products is most likely those with high income, who are more likely to have a more influential position in their company (based on the fact that higher positions = higher pay), as well as people who are willing to adopt new technology (aka: “tech savvy” people).

As described above, the NHL has both the most tech savvy and wealthiest fans. Therefore, based on these two target characteristics, advertising to NHL fans seem like the logical choice, at least when compared to the other major league sports teams in the United States.

Potential Customers Reached: A lot. The NHL fanbase is a near perfect fit for Cisco’s new products.

The Winner

The 2008-2009 Stanley Cup (Advertisement) Champion:It was a tough competition with a lot of strong competitors, but in the end I declare the winner to be Cisco. A high overlap with fan demographics, combined with products that have a particularly high price point, gives me the sense that Cisco is truly capitalizing on the spending and purchase-decision-making power of the NHL audience and therefore generating the strongest returns from their advertising investment.