Fact Tables and Dimension Tables in Microsoft BI Datawarehouse
- A data warehouse is a relational database that centrally stores a consistent view of the data from transactional systems in a denormalised format.
- This makes the retrieval of the data more efficient than it would be from a transactional system.
- A data warehouse will consist of one or more data marts.
- A data mart typically represents an area of the business such as sales or finance departments. Within each data mart, there are two types of tables created
- fact tables and dimension tables
- Dimension tables contain columns that hold information that represents a business entity.
- Examples of dimension tables can include time, region and customers. The columns will contain information that is specific to the entity. In the case of the customers dimension table, columns can include Firstname and Lastname.
- You can define as many columns as you want to precisely define the customer entity. However, placing unnecessary columns increases the volume and time it would take to populate the dimension table with data by using SSIS.
- Dimension tables should consist of two columns that are important in loading the data warehouse. The first column is typically a dimension key known as a surrogate key.
- This is usually a primary key column that uses an identity constraint to automatically populate the column value. It is also used to provide a relationship to the fact table. The second column is referred to as an application key column. This column is used to hold the value of the record in the original transactional system.
- You may also wish to add columns to the dimension table that can be used to track changes to the records within the dimension table. As information within the table changes, you can add a ModifiedDate column that can be used to track the date when changes occurred to a record. To keep historical records of changed data, you could include StartDate and EndDate columns that help you to determine the duration that a record was in a particular state.
- Dimension tables contain columns that hold information that represents a business entity. You can define many dimension tables.
- An example of a business entity that is represented as a dimension table could be customers. The columns will contain information that is specific to that entity. In the case of the customers dimension table, columns can include Firstname and Lastname. Dimension tables will also contain a primary key column that uniquely identifies the records in the dimension table. The primary key also defines a one-to-many relationship with an associated column or columns within a fact table.
- Also note that dimension tables can be shared between different data marts. A common example of this is a Time dimension table. This could be used by different data marts to provide consistent information about time across multiple data marts. These types of dimension tables are known as conformed dimensions. In order to manage the movement of data from source systems to the data warehouse, staging tables may also be created in the data warehouse. This allows tables to act as intermediary tables for cleansing data before loading the cleansed data into the respective dimension tables.
- The fact table contains information about historical transactions and typically represents this information as integer data.
- For example, you can have three dimension tables named Customers, Region and Time. Each has a primary key column that relates to the fact table. In this example, Customer 6231 in the fact table is equal to Francois Ajenstat in the dimension table. Region 4 is equal to the value of West and the Time value of 3423 is equal to the 18th of August, 1975 in the Time dimension table. The additional columns will contain measures about the transaction such as order quantity and sales amount. Using integer data to represent historical transactions allow SQL Server Analysis Services to process the data much more efficiently
- The fact table is at the heart of the data mart and typically consists of three groups of columns. The first is a primary key column to maintain the integrity of the fact table itself. The second group of columns are known as foreign keys.
- Each of these columns relate to a dimension table primary key to provide a relationship between the fact table and the contextual information provided by the dimension table. Imagine that the image represents a Sales data mart, and that the three smaller tables represent dimension tables called CustomerDim, TimeDim and RegionDim.
- The fact table, which would be the big table in the image, would contain three foreign key columns that would relate directly to each dimension table.
- The fact table also contains a third group of columns known as measures. This group of columns hold business metrics such as sales units, sales amounts and cost amounts. As a result, the fact table ideally consists of business information represented by integer values.
- On occasions, a fourth group of columns may be seen within the fact table. This may be one or more columns that are referred to as degenerative dimensions. A degenerative dimension is information about a business entity that is stored within the fact table itself rather than held within a separate dimension table. Using the example of the CustomerDim, TimeDim and RegionDim, these three dimension tables provide context information about customers, time and regions. A central fact table holds foreign key references to the dimension tables and includes measures such as SalesUnits and OrderQuantity. We can include a SalesOrderNumber for each record of a sale. However, rather than creating a separate dimension table that would require a join to the dimension table just to return one piece of contextual information, it is more efficient to store this piece of information within the fact table itself.
- Fact tables are created like any table that is created in SQL Server. You also define the primary and foreign key relationships within the database engine. Therefore, you are limited to the 8-KB limit on the amount of information that is stored in a row within a table