Data Warehouse Design: Core Concepts, Schemas & Best Practices

Data warehouse design is the process of structuring a centralized database that consolidates data from multiple sources, such as CRM systems, ERP platforms, and marketing tools, into a single location optimized for analysis and reporting. Unlike transactional databases built for speed and data entry, a data warehouse is built for querying large volumes of historical data quickly.

The two dominant design philosophies are Inmon’s top-down approach (build a normalized enterprise warehouse first, then create data marts) and Kimball’s bottom-up approach (build subject-oriented data marts first, each using dimensional modeling). Kimball’s approach is more common in practice today because it delivers value faster.

Core Components of a Data Warehouse

Layer	Name	Purpose	Data State
1	Source Systems	Operational databases, APIs, flat files	Raw, transactional
2	Staging Area	Temporary landing zone for ingested data	Raw, unvalidated
3	ODS (Operational Data Store)	Near-real-time cleansed operational data	Cleansed, current
4	Core DW / Integration Layer	Integrated, historical data store	Transformed, historical
5	Data Mart	Subject-specific subset for a team or function	Aggregated, analysis-ready
6	Presentation Layer	BI tools, dashboards, reports	Queried by end users

Inmon vs Kimball: The Two Schools of Thought

Dimension	Inmon (Top-Down)	Kimball (Bottom-Up)
Starting Point	Enterprise-wide normalized warehouse	Individual data marts
Schema Style	3NF (Third Normal Form)	Dimensional (Star/Snowflake)
Time to Value	Slower (months)	Faster (weeks)
Consistency	High – single source of truth	Can have inconsistencies across marts
Best For	Large enterprises, regulated industries	Agile teams, faster analytics delivery
Complexity	High upfront design cost	Lower upfront, higher integration cost later

Schema Design: Star vs Snowflake vs Data Vault

Star Schema

The star schema places a central fact table surrounded by dimension tables. The fact table stores measurable events (sales amounts, pageviews, transactions) while dimension tables hold descriptive context (customer name, product category, date).

It is fast for querying, simple for analysts to understand, and the most widely used schema in business intelligence. The trade-off is some data redundancy in the dimension tables.

Snowflake Schema

The snowflake schema normalizes dimension tables into sub-dimensions, reducing redundancy. A product dimension might link to a category dimension, which links to a department dimension. Storage is more efficient, but queries require more joins and are harder for non-technical analysts to write.

Data Vault

Data Vault splits data into three object types: Hubs (business keys), Links (relationships between hubs), and Satellites (descriptive attributes with full history). It is highly auditable and handles source system changes gracefully – but it is complex to implement and query.

Schema	Query Speed	Storage Efficiency	Change Flexibility	Analyst-Friendly
Star Schema	Fast	Lower (some redundancy)	Moderate	High
Snowflake Schema	Moderate	Higher	Moderate	Lower
Data Vault	Slower (more joins)	Highest	Very High	Low (needs semantic layer)

Step-by-Step Data Warehouse Design Process

Define business requirements – what questions must the warehouse answer? Involve stakeholders early.
Identify data sources – catalog every system that produces relevant data.
Design the staging layer – a raw landing zone that mirrors source data.
Define the dimensional model – choose fact and dimension tables based on business processes.
Design the ETL/ELT pipelines – how data moves from source to warehouse.
Implement slowly changing dimensions (SCDs) – decide how to handle changes to dimension data over time.
Build the presentation layer – data marts or semantic models for BI tools.
Test data quality and performance – validate accuracy and optimize query speed.

Slowly Changing Dimensions (SCDs)

One of the trickiest parts of warehouse design is handling attributes that change over time. For example, a customer moves to a new city. Do you overwrite the old city? Keep both? Track the change with dates? There are six SCD types:

Type 1: Overwrite – no history kept. Simple but you lose the past.
Type 2: Add a new row – full history preserved with effective dates. Most common.
Type 3: Add a column – keeps current and one previous value. Limited history.
Type 6: Combination of Types 1, 2, and 3. Flexible but complex.

Common Design Mistakes

Over-engineering on day one. A star schema serving real users beats a perfect architecture serving no one.
Not tracking data lineage – users need to trust where numbers come from.
Ignoring data quality at the source. Garbage in, garbage out applies more to warehouses than anywhere else.
Treating the warehouse as a backup system instead of an analytical asset.
Skipping documentation. Six months later, no one remembers what that column means.

Modern Tools for Data Warehouse Design

Category	Tools	Notes
Cloud Warehouses	Snowflake, BigQuery, Redshift, Synapse	Fully managed, scalable on demand
Transformation (ELT)	dbt (data build tool)	SQL-based, version-controlled transformations
Orchestration	Apache Airflow, Prefect, Dagster	Schedules and monitors pipelines
Data Modeling	Erwin, LucidChart, dbdiagram.io	Visual schema design
BI / Presentation	Tableau, Looker, Power BI, Metabase	End-user reporting layer

A well-designed data warehouse does not just store data – it makes data trustworthy. When every analyst in your organization is working from the same definitions, the same history, and the same source of truth, decisions get better. That is the actual goal of the whole exercise.