The Data Byte: Pop Science Edition

Cosmos-Blog-Header

At RJMetrics, our mission is to inspire and empower data-driven people. So, on Fridays, we’re going to try something a little fun (because data is fun). We’re going to run The Data Byte — celebrating and examining the many ways data is surfacing in culture.

Neil Degrasse Tyson

useNeilDTysonThis guy! He’s the rockstar astrophysicist of the millennial age. He’s in millions of homes evangelizing for science every Sunday with Cosmos, he’s in your kid’s Superman comic, he demoted a planet.

What makes him important?

Neil deGrasse Tyson is a constant, opinionated advocate for the cosmic perspective. For him, it’s all about contextualizing our place in the universe as loudly as possible. If you take a moment to remember that you are made of starstuff, that you’re breathing the same air and drinking the same water as Napoleon and Cleopatra, that one day our Sun will supernova — if you take a moment for awe, isn’t the world a more marvelous place? And in this marvelous world full of facts, we can stuff our brains. Here’s some more things Neil deGrasse Tyson would like you to remember:

  • The night sky is full of ghosts. The farther away a star is, the longer it took for its light to reach us — long enough to have outlived the star itself. What we see is not what currently exists.
  • We share 90% of our DNA with trees
  • There are more stars than there are seconds of the Earth’s existence

Continue reading

Magento Reports

Magento is a great platform. It’s incredibly flexible, and it is used to power everything from early stage e-commerce sites to some of the biggest sites on the internet. However, like any platform, some areas of its functionality are stronger than others. In this post, we explore an area that often leaves users unimpressed: Magento reports.

Existing Magento Reports

The default Magento reports are very basic. The main things you can do in the Magento reporting interface are:

  • See your top 5 items, search terms or customers
  • See your most recent 5 customers, orders, or search terms
  • Plot a line chart of revenue or orders for a few different time ranges
  • Pull simple lists of orders, products, customers, and search terms

These are useful for business intelligence, but they won’t allow you to build a data driven business. To take things to the next level, you need to supplement Magento’s reporting capabilities with additional analysis.

What’s missing from Magento’s Reports?

The list of things you can do with your data is only limited by the number of questions you have. While it can be tempting to run as many analyses as you have time for, it’s important to focus on actionable metrics.

Some of the analyses and metrics that savvy ecommerce companies study are:

The tools and features that help companies study and manage such metrics include:

  • Custom dashboards
  • Flexibile visualizations
  • Different views and permission levels for various stakeholders
  • Tools to incorporate data stored in other databases, Google Analytics, or spreadsheets

Augmenting Magento Reports With SQL Queries

The first thing that most e-commerce companies try when they want to answer questions with data is to ask a member of the development team run SQL queries.

This works if there’s only one data question, if you don’t need the answer immediately, and if the business user can work with the results in Excel to get what they need. However, if you want your business to use data to drive decisions on a day to day basis, this is not a sustainable solution.

I’ve spoken with hundreds of e-commerce businesses here at RJMetrics and in my previous job in venture capital. I don’t remember ever meeting someone whose IT team had plenty of time for pulling data for business users.

Manual queries can be particularly tricky because of Magento’s entity-attribute-value data model. This allows for a lot of flexibility when building out your store, but it also makes building and maintaining analytical queries much trickier.

Enhancing Magento Reports with Hosted Business Intelligence

Here at RJMetrics, we work with many companies that are on the Magento platform, and we’ve learned a lot about the platform’s strengths and weaknesses. We’ve spent even more time working with e-commerce businesses on all different kinds of platforms to understand their business challenges and how we can help use data to solve them.

We’d love to make your data understandable and actionable. Sign up for the RJMetrics free trial today.

Starting Today: Try RJMetrics for Free without a Credit Card

2012 has been a year of milestones here at RJMetrics. We’ve brought on amazing investors, grown our team, and improved our product by leaps and bounds. Most importantly, more online businesses are than ever are using RJMetrics to better understand their data and make smarter decisions.

We started offering free trials a few months ago, and the success of that program has encouraged us to make it even easier to give RJMetrics a try. Today, we’re happy to make two new announcements:

  • You can now sign up for a free 30-day trial of RJMetrics without entering a credit card or committing to a pricing plan.
  • We’ve cut our implementation time in half (from 14 days to 7 days) and extended our free trials so that the clock doesn’t start ticking until your dashboards are live.
It has never been easier to give RJMetrics a try. Click here to sign up for your free trial today.

New Google Plus Data Shows Weak User Engagement

Google CEO Larry Page recently announced that Google Plus crossed over the 100 million user mark and continues to see strong user growth.

Despite these strong numbers, however, the service continues to be pummeled in the press. Many outlets have claimed that engagement is poor and that growth is only fueled by Google forcing membership upon users of its other products.

Rather than rely on third-party reports, we decided to pull publicly available data on a random population into an RJMetrics online dashboard and see for ourselves.

Here are some of our most interesting findings:

  • The average post has less than one +1, less than one reply, and less than one re-share.
  • 30% of users who make a public post never make a second one. Even after making five public posts, there is a 15% chance that a user will not post publicly again.
  • Among users who make publicly-viewable posts, there is an average of 12 days between each post
  • A cohort analysis reveals that, after a member makes a public post, the average number of public posts they make in each subsequent month declines steadily. This trend is not improving in newer cohorts.

How We Did It

We began by selecting a population of 40,000 random Google Plus users. For each user, we downloaded their entire public timelines (which consist of all publicly-visible activities for that user). Only one third of the users in our population had any public activity, so this sub-set of the population is the main focus of many of our statistics.

Once we had the data, it was a snap to upload it to RJMetrics and pull the insights seen here with just a few clicks.

Since we are looking at public data exclusively, we want to point out that this data is not necessarily reflective of the entire population of users. These are simply insights into the public-facing actions of Google Plus users based on a population that is known to post publicly.

Repeat Posters

Once a user has made one public post, the chances that they will make a second post are quite strong: around 70%. After that, however, Google Plus does not perform as well as other social services that have analyzed. In charts like these, we typically expect to see the probability of repeat posts shoot up to well north of 90% by the time the user has made several posts. This is basically the “once you’re using it you’re hooked” principle.

With Google Plus, however, this number never crosses the 90% mark. Even after having made five such posts, the chance of making a sixth is only 85%. The means that 15% of people who have made five posts never came back to make a sixth.

Cohort Analysis

The cohort analysis below shows the rate at which new publicly-viewable posts are created by users who made their first post in different months throughout time.

This is a cumulative chart, so we’re basically showing the “average number of total posts made” as it grows over time for users in each cohort.

The decay rate here is very concerning. Users are less and less likely to make additional posts even a few months after initially joining. While it may not be an apples-to-apples comparison, it’s interesting to contrast this with the same chart from our Pinterest Data Analysis, which shows no decay whatsoever.

Time Between Posts

We were surprised at the by the length of time between public posts among users. On average, a user waits 15 days between making their first public post and making their second. This number declines with each subsequent post, but not drastically. There is an average of 10 days between a user’s fifth and sixth public posts.

The overall average time between any two public posts by the same user is 12 days.

Remember that, since we are only looking at public posts, it is very possible that users are making non-public posts in between the ones that we were able to see. Despite this, however, we were still quite surprised by the large amount of time between public posts.

+1s, Replies, and Sharing

Of all the categories, we feel that this is the least likely to be biased by the fact that we only studied public posts. These public posts will still be visible to each member’s private networks, and actually could attract +1s, shares, and replies from external users as well. If anything, we would expect our numbers here to be higher than in the general population.

Despite that, our population of nearly 70,000 posts yielded the following properties:

  • An average of 0.77 “+1s” per post
  • An average of 0.54 replies per post
  • An average of 0.17 re-shares per post

Conclusion

From what we can see from the outside looking in, Google Plus has a long way to go before it becomes a real threat to the social networking landscape. While user growth is strong, it is unclear how much of that is driven by tie-ins with other Google products.

At the end of the day, Google Plus simply does not show the same level of ravenous user adoption and engagement that we’ve seen in other social networks (see our reports on Pinterest Data and Twitter Data for examples).

RJMetrics Dashboard Improvements: The Benefits

You might have heard or seen by now that we are in the middle of releasing major changes to your RJMetrics dashboards. The most obvious changes were made on chart visualizations, but this is just the tip of the iceberg. We will also be introducing numerous enhancements that aim to improve your overall user experience. At the time of writing this blog post, we have rolled out the new features to 35% of our clients, and we continue to migrate new clients every day. So if you haven’t yet been transitioned, rest assured that it is just a matter of time for the new charts to reach your dashboard.

In this blog post I will explain what the changes are, and most importantly their benefits to you.

New Dashboard Visualizations

The new charts will now look much more vibrant. They will be rendered using the HTML5 standard which will allow faster rendering, and multi-platform support. The library that we are using also supports a plethora of other types of chart visualizations, which will allow us to extend our chart type offerings in the future.

All new charts will also be using the Scalar Vector Graphics (SVG) format which will allow you to download a graph and resize it without any loss of image quality. Another cool feature included in the new HTML5 charts is the ability to directly click on the name of a series in the chart’s legend to remove or add the series from a multi-series chart.

Transitioning away from Flash means that you can now actively view your dashboards on your iPad/iPhone. Note however that chart editing is not yet fully supported on your mobile devices.

Faster Chart Loading

Chart loading times are dramatically faster with the new chart system. This is partly due to HTML5 rendering, but it is also due to the introduction of a redesigned caching framework. For non-technical folks, caching is a method of saving data in memory so that future access of the same data is faster. More specifically, we use caching to store chart data for faster retrieval. Our live production testing revealed a 10-fold loading time reduction for new charts compared to old charts. The benefit can be even more pronounced if you have charts with a very high number of data points, such as our advanced cohort analysis charts.

Faster Updates

The new caching system will not only reduce chart loading times, but also reduce the time taken to complete a data update cycles (syncing your RJMetrics data warehouse with your own database). This is a result of speeding up the chart pre-caching section of an update cycle.

Another indirect performance enhancement will come from the fact that we will no longer need to make certain calculations during an update cycle. These calculation processes were integral to the proper functioning of the old chart engine, but we have eliminated this requirement in the new charts.

Conclusion

We are all very excited about these changes, and we all believe that this is a major step in the right direction for user experience. The most noticeable change will be chart rendering, but under the hood, these changes will enable us to iterate faster with new features and improvements.

As always we really value your opinion, so let us know what you think about the changes by dropping us an email at support@rjmetrics.com.

Airbnb Data Analysis: 6 Million Users by Year-End, Only 20% Active

Airbnb is one of the hottest sites on the internet. The Y Combinator graduate has raised $120 Million of funding to change the way people find places to stay around the globe.

As fans of Airbnb with a passion for startup data, we decided to try and learn more about the site’s user base by looking at the publicly-available profiles of its members. We sampled just over 60,000 users and were able to draw some interesting insights using an RJMetrics business intelligence dashboard.

Some highlights include:

  • Airbnb has over 2.1 million registered users and is growing about 250% year-over-year. At this rate, they’ll have 3 million users by the end of June and 4 million by the end of August.
  • Almost 85% of Airbnb’s userbase has never received a review as a host or a guest. Our sample suggests that there may be as few as 350,000 reviewed users among the userbase of over 2 million.
  • Usage is addictive — with each additional stay booked through Airbnb, users become increasingly likely to book again.

User Growth

Since Airbnb uses auto-incrementing IDs for its users and does not appear to have skipped any range of ID values, it is quite easy to track user growth over time.

Airbnb has very seasonal growth patterns, with most new users signing up in the summer (peaking in August) and significantly fewer signing up in the winter months (reaching a low point in December). These user analytics were easily extracted using RJMetrics.

The current user count is approximately 2.1 million.

For the past several months, the year-over-year growth rate has been steady at around 250%. Extrapolating this out for the rest of the year puts the site’s user count at over 3 million by the end of June and over 4 million by the end of August. At its current growth rate, the site will approach 6 million registered users by the end of 2012.

Usage

Since we didn’t have direct access to data on actual stays, we used reviews as a proxy for activity. Reviews are the lifeblood of the Airbnb community, so we think it’s fair to assume that the number of reviews is a good proxy for the number of stays.

Most sites we study show signs of the “80/20 rule,” which suggests that 80% of the activity comes from 20% of the users. In Airbnb’s case, it’s more like the “100/20″ rule — only 16% of the user base has been reviewed as a host or a guest.

Here are some other usage statistics:

Only about 14% of users (or about 300,000 users) have been reviewed as guests.

Only about 2.3% of users (or about 50,000 users) have been reviewed as hosts.

A mere 0.5% of the userbase has been reviewed as both guest and host.

5% of users (or about 100,000 users) have active listings, but only 2% (or about 40,000 users) have received reviews from guests. This suggests that more than half of the people listing properties have yet to host a guest.

Repeat Activity

By relying on the same techniques we use to track repeat purchase probability in RJMetrics, we were able to profile the average user’s likelihood of using Airbnb with each additional stay.

As you can see, while only about 14% of the userbase ever books a stay (as indicated by a first review from a host), 22% of those users who book once go on book a second stay via Airbnb. By the time a user has booked five stays, the likelihood that they will book another stay on Airbnb is over 50%.

Note that these percentages are based on the behavior of the existing user population, the majority of which has been registered for less than a year. Since so many users have a limited history on the site, it’s quite likely that these numbers will increase over time.

Conclusion

Airbnb continues to explode in popularity and experience tremendous user growth as a result. As with most consumer sites, however, the population of active users is much smaller than the total registered user count.

Airbnb’s key to continued success will be to both grow its user base and convert more of its registered users into paying customers. As we’ve seen, with each additional booking users become more likely to book again.

Click here to use RJMetrics to draw actionable insights from your company’s data.

The 10 Hardest Places to Eat in San Francisco

Here at RJMetrics we build software to help our clients make data-driven decisions. Sometimes, we get the itch to peek at data from other companies to learn a little bit about how they work. Earlier this year we turned our sights to OpenTable.com, the online restaurant reservation system, and discovered that these restaurants are home to the most sought-after tables in San Francisco:

  1. Flour + Water
  2. Spruce
  3. Frances
  4. Cotogna
  5. nopa
  6. Slanted Door
  7. Wayfare Tavern
  8. Benu
  9. Commonwealth
  10. Quince

As a group, these ten restaurants have received 7 Michelin Stars and 33 OpenTable Diner’s Choice Awards. All are Zagat rated. On the whole, however, this list does not represent the city’s most expensive dining options. Six of these restaurants are in OpenTable’s second-most affordable category, with meals averaging below $30 per person.

We checked several times per day, and, for many of these restaurants, found fewer than ten different reservation slots open during our 34-day sampling period – and those were quickly snatched up. For comparison, of the 461 different restaurants that we checked, the majority had over 220 different reservation slots open at some point during our sampling period.

The results here are unlike what we found in our analysis of New York City – San Francisco’s most popular restaurants are not necessarily the most expensive. But getting a seat at one of these tables can be difficult. The demand for good, affordable dining is so high that our friends over at PrimaTable.com created tools to help diners find open reservations at these and other highly sought-after restaurants.

How did we generate this list?

For every dinner reservation on the half-hours between 6pm and 9pm from January 3rd to February 17th 2012, we checked OpenTable every six hours starting two weeks before the reservation. We ignored Sunday and Monday because many restaurants are closed or do not take reservations on these days.

The 10 Hardest Places to Eat in New York City

Here at RJMetrics we build software to help our clients make data-driven decisions. Sometimes, we get the itch to peek at data from other companies to learn a little bit about how they work. Earlier this year we turned our sights to OpenTable.com, the online restaurant reservation system, and discovered that these restaurants are home to the most sought-after tables in New York City:

  1. Carmine’s Midtown
  2. The Little Owl
  3. Per Se
  4. Eleven Madison Park
  5. Kajitsu
  6. Le Bernardin
  7. ABC Kitchen
  8. Minetta Tavern
  9. Casa Mono
  10. Gramercy Tavern

As a group, these ten restaurants have received 14 Michelin Stars and 38 OpenTable Diner’s Choice Awards. All are Zagat rated. Six are considered by OpenTable to be among the most expensive restaurants, with a meal averaging over $50 per person.

We checked several times per day, and, for many of these restaurants, found fewer than ten different reservation slots open during our 34-day sampling period – and those were quickly snatched up. For comparison, of the 1,301 different restaurants that we checked, the majority had over 230 different reservation slots open at some point during our sampling period.

Restauranteurs take note: New York City residents can’t get enough fine dining. Demand for these dining experiences is so high that our friends over at PrimaTable.com created tools to help diners find open reservations at these and other highly sought-after restaurants.

How did we generate this list?

For every dinner reservation on the half-hours between 6pm and 9pm from January 3rd to February 17th 2012, we checked OpenTable every six hours starting two weeks before the reservation. We ignored Sunday and Monday because many restaurants are closed or do not take reservations on these days.

Data in the Year 2050

In the past decades, business intelligence focused on transforming large unstructured datasets into usable metrics for decision making. Out of this data revolution came Just in Time (JIT) manufacturing and supply chains. Although past data still cannot predict the future perfectly, statistical forecasts and regression models have become widely trusted. With the vast amount of data generated online, insights on people’s intentions, opinions and wishes are also increasingly precise.

So what does the future hold for data analytics in the year 2050? Here are my speculations:

Amusement Stores

Bored of shopping online, you will want to head out to a premium brick-and-mortar store with friends, where an entrance fee is duly charged. Here you’ll find a selection of physical products on display, not holograms, which you can try before buying. But this store is not antique, it knew what you were looking for before you stepped in. Having access to your historical search queries and browsing preferences, both online and in real life, the store changes pricing of products dynamically over time depending on people’s interest. 3D cameras around the store monitor your body language, while microphones record your comments. Being one of the only places left for marketers to evaluate customers in-person, all your actions are meticulously analyzed. Whether you fit into the target market of a product is immediately known to the manufacturer. If the clothes you’re trying on do not fit, a robot will manufacture a tailored version on the spot based on your body size, as recorded by the store cameras. S, M and L tags are things that your grandparents reminisce about. The same goes for any physical object you use. 3D printing, combined with data on your body shape and historical preferences, ensure that every item is a perfect fit. Of course, you have to pay first and probably won’t be able to sell any used items, but that’s the point.

Everything is related

When you’re done at the store, head back outdoors with your smartshades (lenses that dynamically adapt to your vision and the surrounding light intensity, but that also have a Head-Up Display layered for its nano-PC). They’ll help. See your car? You can also see its latest market value, days till an oil change is needed directly from the engine sensor, and a reminder to pickup your kids from piano lessons at 4:30pm. Smartshades help you analyze and live efficiently. Databases are now globally connected and semantic search is king. On the road home, your smartshades will warn you to be careful around drivers with a bad accident history. If you see something interesting, snap pictures and share them with three blinks of an eye. Of course, chatting on your smartshades is not legal as you’re driving, but you can definitely make a stop and say hi to friends you saw walking nearby, on your real-time satellite map. Driving by a grocery store? Pay attention to the top right corner of your shades: it’s telling you what’s about to expire in the fridge (but why bother when you’ve already got a JIT grocery subscription), along with diner suggestions based on available ingredients. As you arrive home, your shades will pull up your favorite hologram shows. If you’re wondering what your loved one is doing, simply switch views to source live video from his or her smartshades. You are now connected.

Forget slice and dice, start chopping and spreading

For people analyzing data, a world of charts and graphs is found through your pair of work smartshades. Here, you can look at metrics from around the globe by spinning a 3D virtual Earth, captured from satellites in orbit. You can manipulate charts, documents, and overlays by hand motion, interpreted by the smartshades’ camera and nano-PC. Zoom in and you will find local data sets on your products. Then choose to view the latest stats from sales, social networks or internal management, always paired with live aerial video for your location of interest in the background overlay. Zoom further and you will see real-time 3D videos from buildings, stores and street cameras for an idea of what customers are doing with your products. Feel free to navigate the environment like a bird. You can even highlight some objects worthy of attention in yellow while the rest of the world apppears in gray. It’s a convenient feature. Now tap on a client or select a group of clients to review their feelings and psychographics. You can definitely tell whether they are happy, disappointed or confused, but they might also be annoyed that you are peeking at them.

Dream Away!

Are we at RJMetrics building this future? Only the stuff you like. Send us your ideas!

New Pinterest Data: What’s Everyone Pinning About?

Last month, we released an in-depth report on Pinterest user behavior. The general theme among our many findings was that Pinterest users are deeply engaged and remain highly active over time.

This led many people to ask a logical follow-up question: what exactly is everyone pinning about? With about a million pins still sitting in an RJMetrics hosted data warehouse from our previous report, we decided to answer that question.

Our full report is below, but here are some of our key findings:

  • The all-time most popular pinboard categories are Home (17.2%), Arts and Crafts (12.4%), Style/Fashion (11.7%), and Food (10.5%).
  • Good news for e-commerce players looking to cash in on Pinterest: Products I Love is the third most popular pinboard name, and pins that exist on pinboards about Products are the most likely to be liked by other users.
  • Food is the fastest-growing pinboard category. Food is also the most likely category to be repinned, on average generating over 50% more re-pins than the next most repinned category, Style and Fashion.

How We Did It

We sampled just under one million pins from pinterest.com by browsing Pin ID Numbers and usernames from the general population. These pins were from about 9,200 unique users and were stored across pinboards with about 15,000 unique names.

We mapped pinboard names to content categories by identifying the most commonly used words and phrases and mapping those to specific content categories. This allowed us to categorize the vast majority of the pinboards we identified into a handful of distinct categories.

As always, RJMetrics was our secret sauce. We conducted the analysis for this article with just a few clicks from our RJMetrics online dashboards.

Most Popular Board Names

Most pinboard names are used by many different users throughout Pinterest. In fact, some board names represent as much as 3% of the total pinboard population. The 10 most common board names, along with the percent of boards they represent, are listed below:

  • For the Home (3.15%)
  • My Style (1.97%)
  • Products I Love (1.86%)
  • Books Worth Reading (1.68%)
  • Food (1.23%)
  • Favorite Places & Spaces (1.00%)
  • Recipes (0.75%)
  • Craft Ideas (0.74%)
  • Christmas (0.72%)
  • Crafts (0.65%)

Those who are excited about Pinterest’s use as a product recommendation engine will be happy to see Products I Love as the third most popular board name, representing about 1.9% of pinboards.

Most Popular Board Categories

In addition to the popular names listed above, there are thousands upon thousands of additional unique board names in our sample. By isolating key words and phrases, we were able to bucket about 85% of the pinboards from our sample into categories.

The top 10 most popular categories are listed below:

  • Home (17.2%)
  • Arts and Crafts (12.4%)
  • Style/Fashion (11.7%)
  • Food (10.5%)
  • Inspiration/Education (9.0%)
  • Holidays/Seasonal (3.9%)
  • Humor (2.1%)
  • Products (2.1%)
  • Travel (1.9%)
  • Kids (1.8%)

Note that over 60% of pinboards fall into the top 5 categories.

Emerging Trends: Food Trumps Fashion

If we look at these top categories over time, by and large it appears as though their relative percentages are holding steady.

As you might expect, the only line showing some degree of seasonality is the one related to holidays and seasonal content.

As a next step, however, we isolated the two categories that had shown the biggest changes in market share over the past 12 months and uncovered something interesting:

In the early days of Pinterest, Style and Fashion represented twice as many pinboards as Food. However, in recent months, Food has gained more ground than any other category, and has actually become more popular than Style and Fashion among new pinboards created.

This could be a reflection of Pinterest’s increasing popularity in the mainstream, as Food is a far more populist topic than fashion. As one of Pinterest’s newer (and less trendy) members, I confess that I’d be more likely to pin about cake than couture.

Most Viral Categories

We looked at the average number of repins and likes for the pins across the different content categories we identified.

Check out the average repins by category below:

Hands down, Food was the most likely category to be repinned. This could explain its increasing share of the category landscape. Style and Fashion is a distant second, and Holidays/Seasonal was the least likely category to be repinned.

Interestingly, the most liked categories were NOT the same as the most repinned. Check out the average number of likes per pin by category:

Here we see that Products is actually the most likely category to be liked. This is surprising, particularly given that Products sat in the middle of the pack when it came to repins.

Humor is a similar story, ranking second most liked but not standing out based on repin frequency.

Conclusion

With Pinterest continuing to explode in size, it will be exciting to see how broader adoption shapes the voice of its community.

Pins about the home, arts and crafts and inspiration continue to dominate the content landscape, but the growing popularity of more broadly-accessible topics like Food and Product Recommendations could indicate that Pinterest is heading toward a more mainstream and commercialized future.