Data warehouses

A data warehouse is a type of data product that consists a database that stores data accumulated from a wide range of sources and stored for efficient retrieval for analysis. Data warehouses are generally structured for online analytical processing (OLAP) to allow for multiple data views, filters, and refinements based on multiple dimensions, which are attributes of interest to the business. This is most commonly achieved by implementing a star schema, which uses a relational model to organize data into facts and dimensions.

A typical data warehouse architecture consists of multiple data sources, a staging area, the warehouse its self, and one or more data marts.

flowchart TD subgraph sources[Data Sources] source1[(Database)] source2[Flat File] end sources--ETL-->stage[(Staging Area)] stage--ETL-->warehouse[(Data Warehouse)] warehouse--ETL-->dm1[(Data Mart)] warehouse--ETL-->dm2[(Data Mart)]

Data is extracted, transferred, and loaded from source to destination by ETL processes, the sum of which constitute a data pipeline. Subsets of the data in a data warehouse are sometimes broken down into data marts, which are essentially "miniature data warehouses" intended for a specific audience.

The data staging area is a temporary storage area for source data that helps quickly extract and consolidate source data, perform quality checks and cleansing, detecting changes, troubleshooting, and performing pre-aggregation functions before the data is transferred to the data warehouse. Staging areas are often ephemeral, though they may be maintained or archived. In modern data warehouse architecture, the staging area is often a data lake.

Deeper Knowledge on Data warehouses

Dimensional Modeling

Concepts, methods, and techniques used to design data warehouses

Data Lakehouses

A combination of data lakes and data warehouses

Azure Synapse Analytics

An integrated set of data services on Microsoft Azure

Snowflake Schemas

Star schemas with normalized dimension tables

Online Analytical Processing (OLAP)

A technique to create views and calculations from multi-dimensional data

Amazon Redshift

A columnar data warehouse solution on AWS

Broader Topics Related to Data warehouses

Data Products

Ways of making data available


Organized collections of structured data

Data warehouses Knowledge Graph