There is no shortage of top-down research telling us that the ecommerce market is enormous, growing extremely fast, and showing no signs of slowing down. According to sources like eMarketer, ecommerce is the only trillion-dollar industry growing at a double-digit percentage each year. And with the US Census Bureau estimating that only 7% of retail sales are done on the internet, ecommerce still has a lot of runway for growth.

Ecommerce is the only trillion-dollar industry growing at a double-digit percentage each year http://ow.ly/ydtSS

Despite all this research, however, no one seems to be able to answer the key question: how many ecommerce companies are there?. The few estimates that exist vary by orders of magnitude, from tens of thousands to nearly a million.

We set out to answer this question for ourselves.

Love exploring the world with data?

So do we. Our data journalism has been cited by The New York Times, The Wall Street Journal, and Fast Company.


How we did it

We have a secret ingredient that helped us build an estimate from the ground-up: proprietary data. Here at RJMetrics, we work with hundreds of online retailers who generously allow us to anonymize high-level data points for analyses like these.

By combining our proprietary data with size and revenue information from third-party sources like the Internet Retailer Top 500 Guide, Alexa, and BuiltWith, we’ve conducted a comprehensive bottoms-up analysis of the ecommerce industry.

Size matters

Obviously, the long tail is going to be very long here. Using BuiltWith to identify which websites have ecommerce technologies installed, we found 180,000 live websites with just the Magento shopping cart. When you extrapolate to include the full universe of competing ecommerce technologies, you can see how some estimates approach the one-million mark. As you might have guessed, however, the majority of these sites are not generating revenue on any meaningful scale.

In order to separate the wheat from the chaff, we needed to come up with revenue-based exclusion criteria.

Tying Alexa rank to revenue

Alexa rank is an easily-obtained proxy for traffic. Alexa ranks every website in the world based on traffic volume. A global rank of 1 represents the website with the most traffic in the world (currently Google). Since ecommerce revenue is directly correlated with the number of visitors to a site, we theorized that Alexa rank could serve as a proxy for revenue. To test this, we needed revenue data for a set of ecommerce companies that spanned a broad spectrum of Alexa ranks.

To get revenue data, we turned to the data in the Internet Retailer Top 500 guide and augmented it with our own proprietary benchmarking data set. The IR 500 includes the heaviest-hitters in ecommerce and our own data covered mid- and smaller-sized companies. Between these two data sets we had Alexa rank and revenue data on the full spectrum of ecommerce companies. Here’s what we saw:

long-tail-revenue-by-alexa-rank

Jackpot! There appears to be a pretty clear-cut link between revenue and Alexa rank. To be sure, let’s zoom in past the Walmarts and Amazons of the world and just look at the “long tail” of sites with Alexa ranks between 10,000 and 1,000,000:

Awesome. These combined data sets have given us visibility into the revenue of ecommerce companies throughout the Alexa top 1 million sites.

Meaningful scale

Note that, while the 500k-1M data point is quite low, it’s far from zero. The mean 2013 revenue for sites in that range is actually $1.5 Million and the median is around $500k. As evidenced by that discrepancy, average revenue drops meaningfully in this range.

For this reason, we’ve made an Alexa rank of 1,000,000 the cutoff for sites we include in our count.

While we are aware of many websites with an Alexa rank above 1,000,000 that are generating well into six and even seven figures of revenue, we believe there would be far more false positives than false negatives if we included sites beyond this mark. We’re comfortable concluding that the balance of false positives/negatives that exist on either side of the threshold are well balanced with a threshold at the Alexa one-million mark.

Defining ecommerce

Now that we had a way of estimating which ecommerce companies are actually generating meaningful revenue, we simply needed some way of figuring out which sites in the Alexa Top One Million are actually ecommerce.

Using the BuiltWith API, we were able to profile every website in the Alexa Top One Million by evaluating the technologies being used by those sites. BuiltWith can detect a whole universe of shopping carts, marketing tools, and other ecommerce-specific technology that makes a website a dead giveaway as ecommerce.

But this wasn’t good enough—we were still getting a lot of false positives and false negatives. We decided to go a step further. We scraped the HTML of each site’s home page and looked for certain words: “shop”, “buy”, “sell”. We also detected defunct pages and sites that looked more like linkspam. We ended up building an entire set of rules to automatically evaluate whether or not a given site was ecommerce.

And at every turn, we evaluated the rules against a set of websites that we had evaluated by hand. Eventually, our algorithm was actually able to predict whether a site was ecommerce with 95% accuracy.

After we had fine-tuned the algorithm, we turned it loose on the Alexa Global Top One Million sites. Here’s what we found:

ecommerce-websites-by-alexa

There are approximately 110,000 ecommerce websites generating revenue of meaningful scale on the internet.

There are 110,000 ecommerce websites generating revenue of meaningful scale on the internet http://ow.ly/ydpCa

More than 12% of the 100,000 highest-traffic websites are ecommerce, and that density clearly declines to about 10% for long tail. According to our data, ecommerce websites make up approximately 10-12% of the internet. And to our knowledge, we’re the first to actually attempt to count them.

Ecommerce websites make up 10-12% of the internet http://ow.ly/ydtSS

I should point out that we include any online transactional business in our assessment. In addition to traditional online retail, this includes companies selling virtual goods, hosted software providers, marketplaces, travel sites, and even mobile apps with a commerce component. Basically, if you can spend money on their website, it qualifies.

It should also be noted that our detection methodology excludes non-English language websites and pornographic websites. When building our algorithm, we had to search for particular content on these pages. We didn’t have the resources to translate and test these rules in other languages, and we didn’t have the…inclination…to test them against pornographic websites. Both of these limitations of our analysis deflate the numbers we report.

Mid-market ecommerce companies generate a ton of revenue

Having just tagged every site on the Alexa Top One Million as ecommerce or not, and having figured out the underlying relationship between Alexa rank and revenue, we have our hands on a pretty interesting dataset. We’ll be exploring this data in several posts down the road, but here’s the first cut we wanted to share with you.

We looked at the revenue breakdown between the largest and smallest of these sites to try to figure out the industry landscape. Based on our dataset, ecommerce clearly breaks down into three distinct groups.

  • The largest ecommerce sites on the internet make up about 1% of the total population and generate 34% of the total revenue.
  • A distinct middle tier of ecommerce sites make up 51% of the total population and generate 63% of the total revenue.
  • Small ecommerce sites make up 48% of the total population and generate 3% of the total revenue.

The top 1% of ecommmerce sites generate 34% of total ecommerce revenue http://ow.ly/ydtSS

Here’s the data:

Alexa Rank % of Total Ecommerce Businesses % of Total Revenue
Top 1-10k 1% 34%
Mid 10k-500k 51% 63%
Bottom 500k-1M 48% 3%

The opportunity in ecommerce

This represents a big opportunity for vendors (like RJMetrics) serving the ecommerce market. Any company that can help merchants move from the bottom to the middle tier of the market will make a very significant impact on their top line. The middle of the market is where traffic volumes start to really bring in dollars, and getting to that scale is an imperative for any ecommerce company focused on growth.

  • Traian Neacsu

    Robert, this is a very interesting analysis. Thanks for putting it up. However, the input data is unreliable (Alexa and BuiltWith). This link on BuildWith – http://trends.builtwith.com/shop – shows that there are 4 Million websites with a cart functionality, which basically qualifies them as ecommerce websites. It doesn’t matter if they generate revenue or not, technically they fit the definition of ecommerce. My way of coming up with the number of ecommerce websites was to sum up the publicly advertised number of stores from the top 10 most used ecommerce platforms. That summed up to almost 800,000 website, but this is not reliable as well, as it’s based on advertising :)

    • http://rjmetrics.com/ Tristan Handy

      Hi Traian,

      Thanks for your careful review, glad to see how interested you are in the topic!

      We evaluated thousands of individual sites by hand while building our algorithm. Without doing that work, it’s impossible to know what’s really out there.

      The reason we used the methodology we did is that most of the websites with ecommerce technology installed actually *aren’t* ecommerce businesses. Frequently, they are defunct–an administrator will install a cart platform on a domain and then never actually get the site up and running. Frequently, they’re totally unrelated business models. Content sites such as newspapers also often use cart platforms, but are not “ecommerce”. There are many other situations where someone that has a cart installed isn’t actually an ecommerce vendor; we had to find each of these situations and then build rules into our algorithm to correct for them.

      Hope that makes sense!

      Tristan

      • Traian Neacsu

        :) ecommerce is a subject dear to my heart. As a matter of fact I will be publishing something on ecommerce pretty soon.

        yes, you are totally right about the forgotten code, but technically speaking anything that involves a monetary transaction between two computers interconnected on WWW is electronic commerce, right?

  • David Booth

    Interesting analysis Robert. Is your data able to show the location of those ecommerce businesses? Eg “East Coast” vs “West Coast”, or by state, or more granular?

    • http://rjmetrics.com/ Tristan Handy

      Yes–we could absolutely do that. We’re not quite there yet, but I can imagine doing that in a future iteration. Will definitely post updates here.

  • http://www.goudengids.be Robin Soubry

    Hi Robert,

    For an analysis on the e-commerce landscape in Belgium, I’m working out a similar initiative.
    Would it be possible to have a call on the methodology you used to clear out most of the false positives/negatives?
    Could the algorithm be licensed to us for analysis of the market we’re looking at? (However Belgium has 3 official languages, so tweaks will be required).

    • http://rjmetrics.com/ Tristan Handy

      Hey Robin, thanks for your interest. I wish I could help. This was actually a massive software development effort (months!) that used a lot of internal proprietary data. We’re not going to be able to release it because of the sensitivity.

      If you read the post as a how-to, you can start to get a sense of how we wrote what we wrote… :)

      Good luck! It was a really fun project.

  • Nikolaus Foulkrod

    Hey Robert,

    I love the article really great data. I am a college student and am doing a research project on Ecommerce in the US. I don’t know if this is possible but is there anyway that you logarithm could also account for the location of the eccomerce company. I have been looking for a heat map of Ecommerce companies by state without any luck. Any suggestions would be greatly appreciated!

    • http://rjmetrics.com/ Tristan Handy

      You’re the second person who has asked for this! That’s awesome. We don’t have that data right now but it looks like we have a good next step…

      • Nikolaus Foulkrod

        Well if you do put something together let me know for sure! That would be awesome

  • Dimitrios Kourtesis

    Hi Robert,

    Great article. Thanks for sharing your insights. The revenue share among top/mid/bottom ecommerce sites is illuminating. But I’m curious about the actual total revenue figure that is being broken down. What is the total revenue you’re assuming those ecommerce sites are making?

    Dimitrios

  • Charlie

    Great post. I’ve been looking for this for ages. I even found the Referral Candy one you referenced, but it didn’t go far enough. I’m stumped by one of the paragraphs in your post: “Note that, while the 500k-1M data point is quite low, it’s far from zero. The mean 2013 revenue for sites in that range is actually $1.5 Million” It doesn’t seem to correlate to the graph above. I would like to answer the question: what’s the mean revenue for the stores you define as top, mid and bottom? Thanks. Charlie

    • http://rjmetrics.com/ Tristan Handy

      Hey Charlie, let me look into this. I totally see what you’re saying and need to take a look at whether the axis on that chart is off or whether I’m just forgetting some piece of information. Will get back to you…

      • Charlie

        Thanks

    • http://rjmetrics.com/ Tristan Handy

      Hah. I really appreciate you pointing this out. It turns out that when we were prepping the charts for this piece we screwed up the y axis—it was off by a factor of 100. Just fixed. Thanks again :)

      • Charlie

        Thanks for checking and fixing this. Really helpful work!

  • http://liesandsubtweets.wordpress.com Nick Quah

    “It should also be noted that our detection methodology excludes non-English language websites”

    Quick question on this: don’t you think the exclusion of non-English sites ends up providing a considerably incomplete picture of e-commerce across the (globally-populated) internet landscape?

    • http://rjmetrics.com/ Tristan Handy

      Hello Nick! You’re not wrong–we’re definitely not estimating the global question. Our goal was to produce the best answer, to-date, of at least *part* of that question. We hope to continue enriching this data set in the coming months and years, such that we’ll be able to understand the growing world of ecommerce better and better.

      The data that went into this post has been very useful to us, and in sharing this we’re hopeful that you’ll find it valuable as well, even though it isn’t yet perfect :)