Adopting a Warehouse-First Approach for Digital Analytics

Note: The views expressed in this blog are my own and do not necessarily reflect the opinions or positions of my employer.

The Current World of Digital Analytics

In today's world, packaged analytics tools like Google Analytics, Adobe Analytics and Tealium dominate the market and landscape. Packaged analytics tools are essentially “out-of-the-box” platforms where businesses can collect behavioural data from their website or app and send it to a third-party vendor, where the data is stored, transformed, modelled and made available for visualisation directly within the platform. These packaged analytics tools are quite well-suited for businesses with the following characteristics: With minimal upfront investment in technology and resources, businesses can quickly begin tracking customer behaviour, gain insights, and make “data-driven” decisions. These tools typically offer a number of features directly “out-of-the-box” including data collection, storage, modelling, analysis, and activation. However, like a "jack of all trades," while they may be able to perform these tasks to a certain extent they don't necessarily offer best-in-class features in all areas.

As organisations scale and mature, the limitations of these packaged analytics tools also start to become more apparent. Over time, these tools may no longer fully meet the needs of a scaling business revealing certain shortcomings. This is where businesses might start to consider adopting a warehouse-first approach to their digital analytics.

What is Warehouse-First Digital Analytics?

The concept of a warehouse-first or warehouse-native approach to digital analytics is quite simple. Instead of collecting and sending behavioural data from your digital properties and storing the data inside a third-party platform, with a warehouse-first approach data is instead sent directly to a data warehouse or data lake that is entirely ran within an organisation’s own internal cloud infrastructure like Snowflake, BigQuery, Databricks and Redshift. This approach bypasses the need for data collection and storage in a third-party analytics platform.

This reflects a subtle shift in the industry where both businesses and analytics vendors are beginning to understand the value businesses receive when they are provided direct access to their raw data in their own cloud environment as it enables more complex and richer use-cases. An example of this trend is Google Analytics 4 native export integration to BigQuery and Amplitude’s re-platforming to run natively on Snowflake.

Additionally, many major digital analytics providers like Adobe Analytics, Tealium and Piwik Pro offer some form of data export capabilities so that businesses can analyse or leverage the data someplace else. However, these exports can often be aggregated and the data exported may lack the necessary granularity.

Warehouse-First Approach vs Packaged Analytics Approach
Warehouse-First Approach vs Packaged Analytics Approach

Why adopt a Warehouse-First approach for Digital Analytics?

As mentioned earlier, there are several limitations or issues of packaged analytics tools which begin to reveal themselves as a business grows and data needs become more complex. It's important to note that these issues don't apply to all packaged analytics tools. However, adopting a warehouse-first approach may help mitigate, alleviate, or even resolve some of these pain points:
  1. Privacy: A key priority for many businesses is understanding how their customer data is collected, stored, and processed. A major benefit of a warehouse-first approach is that data is collected directly into your own environment, without being stored on a third-party vendor’s platform. This simplifies data governance by giving businesses full control over their data, making it easier to audit exactly what data is being collected and how and where it is stored.
  2. CX Convergence and Data Silos: Customer experiences are becoming more complex as the number of customer touch-points increases and team responsibilities blur. This often leads to data silos, with data owned by different teams—such as digital analytics, CRM, offline data, and product data scattered throughout an organisation. This fragmentation makes it harder to extract value from the data. A warehouse-first approach centralises data storage, allowing multiple data sources to reside within a single environment. This helps break down data silos, provides a more holistic view of the customer and enables seamless customer journey analysis across all touch-points.
  3. Vendor Lock-in and Inflexible Schemas: With packaged tools, businesses are often locked into specific vendors as they are forced to track data in a schema or format dictated by a vendor. A warehouse-first approach eliminates vendor lock-in by giving businesses the flexibility to design their own custom data schemas that align with their specific requirements, rather than adapting to predefined structures imposed by third-party platforms.
  4. Black-box Logic: Many packaged analytics tools offer comprehensive functionality out of the box but often process data behind the scenes to optimise performance. While this leads to faster insights, it makes it difficult to understand how those insights are derived and by extension harder explain the underlying logic to business stakeholders. A warehouse-first approach provides full transparency in data transformation, modelling, and logic, allowing businesses to customise these processes to meet their specific needs.
  5. No Single Source of Truth: When there are multiple copies of the same metric being reported i.e in an aggregated UI report, custom report, dashboard or data warehouse there are often data inconsistencies. This then quickly escalates into a core focus for business stakeholder and data teams often spend and waste significant time in answering the universal question of why numbers don’t match. By adopting a warehouse-first approach and centralising the data in a single location, this ensures a single source of truth to provide a more consistent and reliable foundation for decision making across an organisation.
  6. Latency / Delay in (real-time or any) Decision Making: Real-time decision making is quite commonly very difficult with packaged analytics tools as they need to typically pre-process data before making it available to downstream consumers within a UI. By directly sending data into a warehouse, businesses can make faster, more informed decisions and minimise any latency in the analytics process between collection and taking action which is particularly for real-time personalisation use-cases such as product recommendation engines.
  7. Advanced Analysis and Segmentation: Unlike some packaged analytics tools which offer limited capabilities around analysis and segmentation, a warehouse-first setup enables data teams to run complex queries, perform advanced customer segmentation and apply machine learning models directly on raw data for more advanced and rich analysis, especially when integrating digital analytics with other data sources.
  8. Financial Costs: Once a business's data volume reaches a certain level, it often requires upgrading to a paid plan typically based on data volume. A warehouse-first approach can be more cost-effective in the long run (especially for larger organisations) as businesses can scale storage and processing capabilities as needed. They may also potentially avoid the double cost of paying for both a licence alongside additional storage / processing costs for exporting data into a warehouse for more advanced use-cases.

Summary

In conclusion, while packaged analytics are suitable and meets the needs of many businesses, for more mature organisations that are well-established in their data journey and have strong data teams, it may be worth exploring the potential of adopting a warehouse-first approach to digital analytics.