Having access to an effective data warehouse dramatically increases your ability to make smarter decisions, faster. Without a data warehouse, if you want to do cross-domain analysis, you’re stuck dedicating tremendous amounts of time and resources to combining and analyzing data across platforms. With a data warehouse, your entire company has a single location to ask and answer questions.
Building a warehouse is often a larger undertaking than project sponsors realize at the outset, however. Here are the 9 most common reasons data warehouse projects fail.
1. Forgetting about long-term maintenance.
Some of the future maintenance costs that companies forget about are:
- Data formats changing over time
- An increase in data velocity
- The time cost of adding new data connections
- The time cost of fixing broken data connections
- Requests for new features, including new columns, dimensions, and derivatives
We can’t emphasize this piece enough. For example, if you’re connecting to any API-based services (which you definitely should be!), you’ll encounter frequent updates to reporting APIs. Facebook in particular keeps us on its toes with its “move fast and break things” approach to development. If you want uninterrupted access to data from your cloud platforms, you’ll need to be prepared to quickly respond to updates.
As long as you have a data warehouse, you will need to have internal resources dedicated to maintaining it. Companies who maintain their own warehouses dedicate whole teams of engineers just to this task.
2. Assuming building a data warehouse is like your other tech projects.
The engineers who are skilled at building your product and website are typically used to working with completely different data technologies than are required when building data pipelines. And big data technologies like Hadoop, EMR, and Storm have serious learning curves.
Can your engineers learn new skills? Absolutely—and they’d probably love to! But can you afford to take time away from working on your core product?
3. Underestimating data transformation requirements.
Raw data from disparate sources almost always needs to be cleaned and normalized in order to make sense in your data warehouse. This is often referred to as an ETL (Extraction, Transformation, and Load ) process.
The process seems simple enough at the outset: take data from various sources and copy-paste it into your warehouse. If only. Data from different systems typically doesn’t play together very well, and requires work to get it to cooperate. Here are some common examples:
- Establishing key relationships across data sources, many of which might not exist in the raw data
- Updating new values on existing records without sacrificing performance
- Time zone consistency
- Lookup value normalization (US = United States = USA)
If you ignore the transformation step, the data in your warehouse will be impossibly difficult to work with, full of inconsistencies, and decision makers will lose faith in its reliability.
4. Underestimating the creativity of your users.
We are continuously surprised (and delighted) by the ways our clients use their data. Some of our clients have hundreds of calculated columns that we could never have anticipated when we built our platform. When you’re building your own data warehouse, make sure you’re providing your users flexible tools, not just solutions. Data analysis has an unbounded solution set.
5. Foregoing the customer development process.
Before you begin, remember that business units are the customers for this project. Which parts of your business want the data warehouse, and why? Run through the customer development process: do interviews (and not just with managers) and get your hands on their current analyses. If you don’t have this information, you might be surprised by things like:
- Not bringing in the information your users need most
- Failing to support mission-critical reporting workflow
- Anticipating future data needs
Regarding your business units as a customer and involving them in the development process will help ensure you build the system they need.
6. Tightly coupling different elements of your data pipeline.
Any pipeline that’s responsible for shipping data into a warehouse has more components than you would initially anticipate:
- Data connectors for each raw data source
- A storage layer for all historical data
- Transformation logic
- Loading routines into your warehouse of choice
Each one of these components and their subcomponents involves independent technical decisions which will impact the scalability, functionality, and maintenance burden down the road. Frequently, first-time data engineers attempt to solve all of these problems with the same technology, but often each component requires a specialized solution. Choosing the right tool for each component of your stack allows you to update components of your pipeline down the road as technologies and business needs change without having to rebuild from scratch.
7. Building your data warehouse based on your current data scale.
When we first built our platform in 2008, the volume of data available to our clients was much smaller, and we built the first version of RJMetrics to support that version of the world. Since then, the volume and variety of data has increased so dramatically that we recently launched a completely revamped platform that can not only handle this new volume, but scale almost infinitely to handle whatever data our clients send our way.
This same rule will hold true for you — the data needs of your organization today will be much greater tomorrow.
8. Not taking into account the fast-paced evolution of data analytics tech.
Data analytics is one of the hottest areas in tech right now, and every single layer of the stack is evolving rapidly. A warehouse built ten years ago would have completely missed the columnar data revolution in the late 2000’s, and one built in 2012 would have missed the advent of Amazon Redshift, today’s dominant warehousing solution. And that’s just the warehousing layer: the supporting technologies are being iterated on literally every day. This isn’t about to change any time soon.
If you’re ready to make the significant investment required to stay on top of this ever-changing tech landscape, these evolutions are incredibly exciting. If you’re planning on spending a few weeks or even months getting a warehousing project off the ground and then moving on, you’re going to be out of date fast.
9. Insisting on moving forward not recognizing the signs are against you.
It can be easy to get fixated on sunk costs and plow ahead as a project gets more and more off-track. The signs to look for are similar to the signs of any failing tech project:
- Are you missing deadlines?
- Are your engineers spending more time building new features or supporting existing ones?
- How do your engineers feel about the project? How about your end users?
It’s not news to anyone that IT projects usually encounter bumps in the road, but if building your data warehouse project is starting to feel like a boulder you’re rolling up an ever-steeper incline, it’s time to take a step back.
Build With Caution
Having a data warehouse is incredibly important. An effective warehouse empowers analysts and decision-makers to do what they do both faster and better. If you’re a believer in data-driven decision making (and we certainly are), then there really aren’t many projects of greater strategic significance.
We’ve spent years working with hundreds of the fastest growing online businesses to meet their data analysis needs, and we’d love to save you from a bunch of unnecessary work. Sound interesting? Let’s talk.