Everyone wants to run a data-driven business. But using data effectively is neither free nor easy. It requires infrastructure and tools, which inevitably fall on engineers to build and maintain. These ancillary tasks are a distraction from your engineering team’s core goal: building a great product. As the CTO, your challenge is to invest enough in data infrastructure to enable a data-driven culture, without losing forward momentum on product development.
Based on my experience, companies change dramatically every time they triple, from 3 to 10 or from 30 to 100. Different strategies are appropriate at each stage of this evolution, whether you’re talking about management, security, or human resources. Data infrastructure is no different, here is what you need every step along the way:
1 to 3 employees
At this stage you’re pre-revenue, pre-launch. You have a homepage and you’re starting to get some traffic, but other than that there’s not much data to look at, so don’t do much work on your data infrastructure.
This is the time to have your engineers thinking ahead to ensure you are retaining data that you will need for future analyses. Follow best practices in your database design, and avoid using UPDATEs and DELETEs whenever possible, because they destroy valuable information. As you build new features, think about how you will analyze them, and instrument your code accordingly.
This is also a good time to ensure your Google Analytics account is setup properly, and figure out who on the team is going to be responsible for it – preferably a non-engineer. At RJMetrics, this responsibility has always belonged to a member of the marketing team, since they are the primary consumers of that data.
Business users are going to ask you and your team to “pull some numbers” from the database. At this stage that’s OK – these requests should be infrequent enough that they can be handled ad hoc.
4 to 10 employees
You have some real users and data is starting to flow in. You’re trying to find product-market fit. You and the rest of the company’s leadership keep tabs on a few basics KPIs, and several members of the team are trying to understand the basic dynamics of your product and operations, leading to an increasing number of analytics requests.
In this stage, it is crucial for your engineering team to transition away from answering questions. The goal now is to give business users the best possible tools to explore and report on your company’s data. Part of the challenge is to understand those business users, which could be almost everyone in your company.
On the infrastructure side, setup a slave database with read-only accounts so analytic queries don’t interfere with your production systems. Then, talk to your business users. Find out how they want to interact with your data. The universe of tools, both free and paid, is large and mature enough that you don’t need to roll your own. More technical teams might be willing to deal with exporting SQL results into R or Excel, while others will prefer more visual tools that abstract away raw SQL. Understand your business users and focus your efforts on the tools that make the most sense for them.
At this point, you should also focus on tools that require minimal investment from your team. You might only have three or four engineers, and you need every last man-hour to finish features and pay down tech debt. Avoid dedicating 25% or more of your engineering time to analytics unless it’s absolutely necessary. This is exactly why we built RJMetrics. We want to give companies at this stage a cloud-based business intelligence tool they can afford, so their teams can focus on the most important thing: building a great product. For example, the marketing team at Craftsy focuses each member of their team on a KPI and a goal, and uses RJMetrics to let everyone track progress against those goals — all without distracting their engineering team.
11 to 30 employees
Data volume is picking up. There is at least one business user spending large portions of the week doing data analysis. Your data is spread across multiple silos. If you’ve done a good job of putting analytics tools in the hands of your business users, by this time they’re all asking for more: more data, more tools, faster.
This is the time to start thinking about your long-term strategy. If you’re going to build it yourself, and you want to do it well, be prepared to have at least one member of your engineering team working on your data infrastructure at all times. Scaling to accommodate the increasing volume and variety of data and queries is only going to get harder. For now, a few strategic indices on the slave database will suffice, but soon you’re going to need a more scalable storage layer dedicated to the task; a data warehouse. Additionally, you’ll need an extract-transform-load (ETL) process that aggregates data from multiple sources, cleans and transforms it, and then loads it into a querying engine like Amazon Redshift for consumption.
There are alternatives to distracting your engineering team: you can hire a large consultancy to build a custom solution (for a large price tag), or you can connect your data to a service that will maintain a data warehouse for you. There is no out-of-the-box solution that gets your team off the hook entirely. You are building the product that generates most of the data, so you will need to dictate how that data is analyzed. But a platform like RJMetrics takes the scaling issue off your plate, and gives you simple interfaces for modifying the data warehouse, cleaning data, and integrating third party data sources.
This was the kind of simplicity that the team at Callfire needed while trying to migrate their product from one technology platform to another. During the transition they wanted metrics from both platforms, but integrating both manually would have been a major distraction at an already critical time.
31 to 100 employees
The pattern continues: the volume of data and queries is growing. You’ve made it this far because of your attention to your data, and key decisions are being made from it every day. Your data warehouse’s availability and freshness are critical. The size of your organization exposes challenges around information consistency and collaboration, which are key components of an effective data-driven organization.
In this stage, one or more people should be dedicated to maintaining the data infrastructure; it’s no longer a part time job. Who that person is depends on your strategy. If you’re building large pieces in-house, this person has to be an engineer, and ideally one with experience building large data analysis systems. If you’ve outsourced to a consultant or service, this person may look more like an analytics business user.
101 to 300+ employees
As data volume and use-cases continue to grow, the engineering of your data infrastructure becomes harder and more important. Companies like Twitter and LinkedIn have large teams of engineers dedicated to data infrastructure, and you may eventually too. This is the time to start thinking about a role like “Director of Analytics” to guide the long-term strategy and vision of your company’s data infrastructure. Whatever you do, remember that switching costs grow exponentially from here, so make sure you have a strategy, and if you have any doubts about how your infrastructure will scale, act soon.
What to do today
As the engineering leader of a growing organization, one of your responsibilities is to cultivate an environment that empowers your colleagues to make data-driven decisions. But you can’t let that distract from your primary responsibility. At every stage of growth, talk to your business users, and let them be your guide for a right-sized approach. This will give them the tools they need, and allow you to get back to focusing on what you do best — building an amazing product.