Imagine that we are developing a DW solution for a business that has the following concepts:
A person creates a contact;
A contact leads to a process;
A contact can be linked to many processes and vice versa;
The process can originate one or more participations;
A participation can have one or more compensations associated.
With the previous points we know that we have a many to many relationship between contacts and processes and pretty much everything else that involves processes or contacts. Now think that we need to elaborate reports where the dimensions linked to compensations need to filter the measures of contacts.
A model design starts to appear in front of you…we need a bridge table to join everything. The simple solution would be creating a fact table with all the ID´s from all the dimensions so all the tables have a direct relationship between them. There are several problems with this solution:
The bridge table will be responsible for much of the space used in the DB since it has every id and combination between all the contacts, processes, participations and compensations;
The ETL design will be complex to maintain;
The ETL process will have a big bottleneck loading the Fact table;
If the business evolved to multiple people per contacts, the bridge table will be even bigger.
Today what you will learn is a different approach to this model design so that we avoid the negative aspects mentioned above. For our new design we need to accomplish the following objectives:
Create a bridge table (aka factless table);
Join each fact table through the bridge table;
Join all fact tables with all the dimensions through;
Create a dummy member when there is no relation between contacts, processes, participations and compensations (very important).
Below is an image of how the bridge table would look without the dummy member.
Now the image with the dummy member (Not Applicable).
We need to create the dummy member because it is necessary to accommodate every single relationship between the 4 fact tables, in the bridge table, when using SSAS. When we are establishing a many to many relationship in the dimension usage, if we don’t have the (N/A) member, the join will filter all the non-related members automatically. So we add the N/A to all the 4 fact tables in order to propagate them to the bridge table.
The final result will be this:
The final result is the following:
All the ids and measures in their own fact table;
The ETL solution is easier to maintain and to make understandable to new project partners;
The bridge table isn’t responsible for 40-50% of the database allocation;
The bridge table doesn’t get “out control” with this approach.
Now that we have the DW nicely designed let’s configure the relationship type in SSAS dimension usage.
Below is an image of how our solution should look in de SSAS dimension usage tab:
We have the dimensions horizontally and the measure groups vertically. Notice the ALL Measures measure group, which is linked by a regular relationship to all the first 4 dimensions. This is important because it is through these connections that we can then establish a many to many relationship with all the other measure groups.
Let’s take a look in detail of how we can achieve this:
Above we join the Process measure group to the Contact dimension through the ALL Measures measure group.
We can do this because of the previous regular relationship we’ve created. The dimensions can now be related to all measure groups using the previous method. Now all we have to do is filter out the N/A member either using cube partitions or creating calculated measures in MDX that exclude it.
By choosing one of the above options we can finally start exploring our data using excel pivot tables.
The table shows Process Count distributed by Contact Delegation. As expected the total sum of process count (287.739) is bigger than the Grand Total (287 .511) because one process may be joined to 2 or more contacts, appearing twice on the Process Count column (one for each contact).
Column Process Count only shows a process distinct count, filtering out any process repetition. Next we have a more detailed explanation of how is SSAS processing the many to many relationships we’ve created.
SSAS finds every ID in de All table (bridge) that has France as their delegation;
SSAS finds every ID in the Process table that joins with the previous ID’s;
SSAS finally counts every distinct Process ID from the previous join result;
Grand Total represents the final distinct count of the set of ID’s obtained.
To sum up all that we have done I finish this first part with the following notes:
The main goal of using Many-to-Many approach is to correlate dimensions and facts;
Create a “bridge” table that has all the relations between the fact tables so we can put in perspective each fact by all the dimensions;
Each fact table needs to have the “Not Applicable” record to avoid disappearing records.
.
.
.
.