There is no shortage of top-down research telling us that the ecommerce market is enormous, growing extremely fast, and showing no signs of slowing down. According to sources like eMarketer, ecommerce is the only trillion-dollar industry growing at a double-digit percentage each year. And with the US Census Bureau estimating that only 7% of retail sales are done on the internet, ecommerce still has a lot of runway for growth.
Despite all this research, however, no one seems to be able to answer the key question: how many ecommerce companies are there?. The few estimates that exist vary by orders of magnitude, from tens of thousands to nearly a million.
We set out to answer this question for ourselves.
Love exploring the world with data?
So do we. Our data journalism has been cited by The New York Times, The Wall Street Journal, and Fast Company. Sign up to receive regular updates from The Data Point.
How we did it
We have a secret ingredient that helped us build an estimate from the ground-up: proprietary data. Here at RJMetrics, we work with hundreds of online retailers who generously allow us to anonymize high-level data points for analyses like these.
By combining our proprietary data with size and revenue information from third-party sources like the Internet Retailer Top 500 Guide, Alexa, and BuiltWith, we’ve conducted a comprehensive bottoms-up analysis of the ecommerce industry.
Obviously, the long tail is going to be very long here. Using BuiltWith to identify which websites have ecommerce technologies installed, we found 180,000 live websites with just the Magento shopping cart. When you extrapolate to include the full universe of competing ecommerce technologies, you can see how some estimates approach the one-million mark. As you might have guessed, however, the majority of these sites are not generating revenue on any meaningful scale.
In order to separate the wheat from the chaff, we needed to come up with revenue-based exclusion criteria.
Tying Alexa rank to revenue
Alexa rank is an easily-obtained proxy for traffic. Alexa ranks every website in the world based on traffic volume. A global rank of 1 represents the website with the most traffic in the world (currently Google). Since ecommerce revenue is directly correlated with the number of visitors to a site, we theorized that Alexa rank could serve as a proxy for revenue. To test this, we needed revenue data for a set of ecommerce companies that spanned a broad spectrum of Alexa ranks.
To get revenue data, we turned to the data in the Internet Retailer Top 500 guide and augmented it with our own proprietary benchmarking data set. The IR 500 includes the heaviest-hitters in ecommerce and our own data covered mid- and smaller-sized companies. Between these two data sets we had Alexa rank and revenue data on the full spectrum of ecommerce companies. Here’s what we saw:
Jackpot! There appears to be a pretty clear-cut link between revenue and Alexa rank. To be sure, let’s zoom in past the Walmarts and Amazons of the world and just look at the “long tail” of sites with Alexa ranks between 10,000 and 1,000,000:
Awesome. These combined data sets have given us visibility into the revenue of ecommerce companies throughout the Alexa top 1 million sites.
Note that, while the 500k-1M data point is quite low, it’s far from zero. The mean 2013 revenue for sites in that range is actually $1.5 Million and the median is around $500k. As evidenced by that discrepancy, average revenue drops meaningfully in this range.
For this reason, we’ve made an Alexa rank of 1,000,000 the cutoff for sites we include in our count.
While we are aware of many websites with an Alexa rank above 1,000,000 that are generating well into six and even seven figures of revenue, we believe there would be far more false positives than false negatives if we included sites beyond this mark. We’re comfortable concluding that the balance of false positives/negatives that exist on either side of the threshold are well balanced with a threshold at the Alexa one-million mark.
Now that we had a way of estimating which ecommerce companies are actually generating meaningful revenue, we simply needed some way of figuring out which sites in the Alexa Top One Million are actually ecommerce.
Using the BuiltWith API, we were able to profile every website in the Alexa Top One Million by evaluating the technologies being used by those sites. BuiltWith can detect a whole universe of shopping carts, marketing tools, and other ecommerce-specific technology that makes a website a dead giveaway as ecommerce.
But this wasn’t good enough—we were still getting a lot of false positives and false negatives. We decided to go a step further. We scraped the HTML of each site’s home page and looked for certain words: “shop”, “buy”, “sell”. We also detected defunct pages and sites that looked more like linkspam. We ended up building an entire set of rules to automatically evaluate whether or not a given site was ecommerce.
And at every turn, we evaluated the rules against a set of websites that we had evaluated by hand. Eventually, our algorithm was actually able to predict whether a site was ecommerce with 95% accuracy.
After we had fine-tuned the algorithm, we turned it loose on the Alexa Global Top One Million sites. Here’s what we found:
There are approximately 110,000 ecommerce websites generating revenue of meaningful scale on the internet.
More than 12% of the 100,000 highest-traffic websites are ecommerce, and that density clearly declines to about 10% for long tail. According to our data, ecommerce websites make up approximately 10-12% of the internet. And to our knowledge, we’re the first to actually attempt to count them.
I should point out that we include any online transactional business in our assessment. In addition to traditional online retail, this includes companies selling virtual goods, hosted software providers, marketplaces, travel sites, and even mobile apps with a commerce component. Basically, if you can spend money on their website, it qualifies.
It should also be noted that our detection methodology excludes non-English language websites and pornographic websites. When building our algorithm, we had to search for particular content on these pages. We didn’t have the resources to translate and test these rules in other languages, and we didn’t have the…inclination…to test them against pornographic websites. Both of these limitations of our analysis deflate the numbers we report.
Mid-market ecommerce companies generate a ton of revenue
Having just tagged every site on the Alexa Top One Million as ecommerce or not, and having figured out the underlying relationship between Alexa rank and revenue, we have our hands on a pretty interesting dataset. We’ll be exploring this data in several posts down the road, but here’s the first cut we wanted to share with you.
We looked at the revenue breakdown between the largest and smallest of these sites to try to figure out the industry landscape. Based on our dataset, ecommerce clearly breaks down into three distinct groups.
- The largest ecommerce sites on the internet make up about 1% of the total population and generate 34% of the total revenue.
- A distinct middle tier of ecommerce sites make up 51% of the total population and generate 63% of the total revenue.
- Small ecommerce sites make up 48% of the total population and generate 3% of the total revenue.
Here’s the data:
|Alexa Rank||% of Total Ecommerce Businesses||% of Total Revenue|
The opportunity in ecommerce
This represents a big opportunity for vendors (like RJMetrics) serving the ecommerce market. Any company that can help merchants move from the bottom to the middle tier of the market will make a very significant impact on their top line. The middle of the market is where traffic volumes start to really bring in dollars, and getting to that scale is an imperative for any ecommerce company focused on growth.