Fact Tables and Dimension Tables in Microsoft BI Datawarehouse
A data warehouse is a relational database that centrally stores a consistent
view of the data from transactional systems in a denormalised format.
This
makes the retrieval of the data more efficient than it would be from a transactional
system.
A data warehouse will consist of one or more data marts.
A data mart
typically represents an area of the business such as sales or finance
departments. Within each data mart, there are two types of tables created
fact
tables and dimension tables
Dimension Tables
Dimension tables contain columns that hold information that represents a
business entity.
Examples of dimension tables can include time, region and
customers. The columns will contain information that is specific to the entity.
In the case of the customers dimension table, columns can include Firstname and
Lastname.
You can define as many columns as you want to precisely define
the customer entity. However, placing unnecessary columns increases the
volume and time it would take to populate the dimension table with data by
using SSIS.
Dimension tables should consist of two columns that are important in loading
the data warehouse. The first column is typically a dimension key known as a
surrogate key.
This is usually a primary key column that uses an identity
constraint to automatically populate the column value. It is also used to
provide a relationship to the fact table. The second column is referred to as
an application key column. This column is used to hold the value of the record
in the original transactional system.
You may also wish to add columns to the dimension table that can be used to
track changes to the records within the dimension table. As information within
the table changes, you can add a ModifiedDate column that can be used
to track the date when changes occurred to a record. To keep historical records
of changed data, you could include StartDate and EndDate columns
that help you to determine the duration that a record was in a particular
state.
Dimension tables contain columns that hold information that represents a
business entity. You can define many dimension tables.
An example of a business
entity that is represented as a dimension table could be customers. The columns
will contain information that is specific to that entity. In the case of the
customers dimension table, columns can include Firstname and Lastname.
Dimension tables will also contain a primary key column that uniquely
identifies the records in the dimension table. The primary key also defines a
one-to-many relationship with an associated column or columns within a fact
table.
Also note that dimension tables can be shared
between different data marts. A common example of this is a Time dimension
table. This could be used by different data marts to provide consistent
information about time across multiple data marts. These types of dimension
tables are known as conformed dimensions. In order to manage the movement of
data from source systems to the data warehouse, staging tables may also be
created in the data warehouse. This allows tables to act as intermediary tables
for cleansing data before loading the cleansed data into the respective
dimension tables.
Fact Tables
The fact table contains information about historical transactions and typically represents this information as integer data.
For example, you can have three dimension tables named Customers, Region and Time. Each has a primary key column that relates to the fact table. In this example, Customer 6231 in the fact table is equal to Francois Ajenstat in the dimension table. Region 4 is equal to the value of West and the Time value of 3423 is equal to the 18th of August, 1975 in the Time dimension table. The additional columns will contain measures about the transaction such as order quantity and sales amount. Using integer data to represent historical transactions allow SQL Server Analysis Services to process the data much more efficiently
The fact table is at the heart of the data mart and typically consists of
three groups of columns. The first is a primary key column to maintain the
integrity of the fact table itself. The second group of columns are known as
foreign keys.
Each of these columns relate to a dimension table primary key to provide a
relationship between the fact table and the contextual information provided by
the dimension table. Imagine that the image represents a Sales data mart,
and that the three smaller tables represent dimension tables called
CustomerDim, TimeDim and RegionDim.
The fact table, which would be the big
table in the image, would contain three foreign key columns that would relate
directly to each dimension table.
The fact table also contains a third group of columns known as measures.
This group of columns hold business metrics such as sales units, sales amounts
and cost amounts. As a result, the fact table ideally consists of business
information represented by integer values.
On occasions, a fourth group of columns may be seen within the fact table.
This may be one or more columns that are referred to as degenerative
dimensions. A degenerative dimension is information about a business entity
that is stored within the fact table itself rather than held within a separate
dimension table. Using the example of the CustomerDim, TimeDim and RegionDim,
these three dimension tables provide context information about customers, time
and regions. A central fact table holds foreign key references to the dimension
tables and includes measures such as SalesUnits and OrderQuantity. We can
include a SalesOrderNumber for each record of a sale. However, rather than
creating a separate dimension table that would require a join to the dimension
table just to return one piece of contextual information, it is more
efficient to store this piece of information within the fact table itself.
Fact tables are created like any table that is
created in SQL Server. You also define the primary and foreign key
relationships within the database engine. Therefore, you are limited to the
8-KB limit on the amount of information that is stored in a row within a table
0 comments:
Post a Comment