Star schemas

A star schemas is a specific implementation of a dimensional model that organizes relational data into fact and dimension tables, typically to store data in a data warehouse. In a typical star schema, there is one central fact table that contains all of the data related to a particular business process (such as sales, support request, defect reports, etc.), and several dimension tables that contain descriptive information about the various dimensions that are used to analyze that process (such as time, location, and product). The fact table and dimension tables are joined together using foreign keys to create a complete picture of the data.

Example start schema for Sales

Here's a simple example of a star schema, organized around sales:

erDiagram sales_facts { int DateId int StoreId int ProductId int UnitCount float Total } date_dims ||--o{ sales_facts : DateId store_dims ||--o{ sales_facts : StoreId product_dims ||--o{ sales_facts : ProductId date_dims { int DateId Datetime Date int Day string DayOfWeek int Month string MonthName int Year } store_dims { int StoreId int StoreNumber string State } product_dims { int ProductId string Sku string Name string Category }

This allows reports to be easily built. For example, to calculate TV sales by brand and year using SQL:

SELECT
	date_dims.Year,
	product_dims.Brand,
	SUM(sales_facts.UnitCount) as TotalUnitsSold
FROM sales_facts
	INNER JOIN date_dims ON (sales_facts.DateId = date_dims.DateId)
	INNER JOIN store_dims ON (sales_facts.StoreId = store_dims.StoreId)
	INNER JOIN product_dims ON (sales_facts.ProductId = product_dims.ProductId)
WHERE 
	product_dims.Category = 'TV'
GROUP BY
	product_dims.Brand,
	date_dims.Year
ORDER BY
	date_dims.Year,
	product_dims.Brand,

-- Sample output -- 
/*
Year	| Brand 	| TotalUnitsSold
---------------------------------------
2020	| Panasonic	| 123
2020	| Samsung	| 121
2020	| Sony		| 234
2021	| Panasonic	| 271
2021	| Samsung	| 90
2021	| Sony		| 115
*/

Types of fact tables

Transaction Fact Tables

Transaction fact tables contain the lowest level of detail available and are the most common type of fact table. Each row in a transaction fact table captures a single event (e.g., a click on a web page) and link to all available dimensions for that event.

Periodic snapshot fact tables

Periodic snapshot fact tables, or simply snapshot fact tables, summarize measurements for a given time period (e.g. hourly, daily, etc.). Snapshot fact tables contain pre-aggregated measures that can be derived from transaction fact tables. Examples of periodic snapshot fact tables include hourly snapshots of clicks on a webpage, daily sales, and monthly account balances.

Accumulating snapshot fact tables

Accumulating snapshot fact tables capture key milestones in a given business process to simplify analysis of time between each step of the process. For example, an accumulating snapshot fact table for the checkout process of an e-commerce website might store one row for every user and mark the key steps (when the cart was reviewed, when shipment information was provided, when payment was provided, etc.).

Star schema modeling resources

Deeper Knowledge on Star Schemas

Snowflake Schemas

Snowflake Schemas

Star schemas with normalized dimension tables

Broader Topics Related to Star Schemas

Dimensional Modeling

Dimensional Modeling

Concepts, methods, and techniques used to design data warehouses

Online Analytical Processing (OLAP)

Online Analytical Processing (OLAP)

A technique to create views and calculations from multi-dimensional data

Star Schemas Knowledge Graph