27 May 2015

How to develop a Many to Many SSAS In-Memory Database – Tabular Model

Let me start by saying that the following article will help you learn and comprehend the task of building a many to many tabular model. This article is the second post following “How to develop a Many to Many SSAS Multidimensional Database” which it is a must read before this one.

So our objective here is to implement a Business Intelligence Model based on the same DW solution has in the first blog post, but using a new technology.

Let’s start by opening Visual Studio 2012 and then create a new Analysis Services Tabular Project.

Now we need to import data from a database that in this example is going to be a Microsoft SQL Server Database type.

I will import the following tables into my project:

All_Measures; ( Bridge table )

Contact; ( Dimension )

Process; ( Dimension )

Participation; ( Dimension )

Compensation; ( Dimension )

Contact_Measures; ( Fact table )

Contact_Creation_User; ( Dimension )

Contact_Creation_Date; ( Dimension )

Process Measures; ( Fact table )

Process_Creation_User; ( Dimension )

Process_Creation_Date; ( Dimension )

Participation_Measures; ( Fact table )

Participation_Creation_User; ( Dimension )

Participation_Creation_Date; ( Dimension )

Compensation_Measures; ( Fact table )

Compensation_Creation_User; ( Dimension )

Compensation_Creation_Date; ( Dimension )

Let’s continue with establishing the necessary connections in order to achieve our goal. I am going to begin by joining the Bridge table to the Contact dimension table, Process dimension table, Participation dimension table and to the Compensation dimension table. We now have a diagram that should look like this:

Adding the remaining tables we end up joining the Fact Table with the Bridge Table through a Dimension Table that in the picture below is the Compensation table.

After we have the Compensation_Measures Table linked with the ALL_Measures Bridge Table we can proceed and join each related dimension with its Factual like it is shown in the picture above. Now it is time for the fun part… DAX (Data Analysis Expressions).

Once our tabular model is ready we will be able to explore it using Excel pivot tables and we must now learn how to create, in DAX, the same calculations we have in MDX. To begin we need to understand how context works. There are three ways of filtering information:

  • Query Context
    • Used when we slice our measures by some dimension;
  • Filter Context
    • Defines the rows selected in a calculation;
    • Acts like a normal filter when creating a measure;
  • Row Context
    • Presents information from that row only;
    • Needs the use of certain functions to get outside data from the row in particular;
    • Can be used with iterative calculations to obtain nested row context;
    • Used when creating Calculated Columns;

Now that we have a general idea of how important context is in DAX let’s start with a simple analysis: I want to create a Calculated Field, Aux_Nr_Contacts, representing the total of Contacts in my data. Open the Contact_Measures table and below the actual table data select a cell. I will write the DAX formula, on the selected cell, like this: Aux_Nr_Contacts:= COUNTA(Contact_Measures[Id_Contacts]) Let me explain the previous formula:

  • “Aux_Nr_Contacts:” The name of the calculated field;
  • “=COUNTA( )” Function that counts the number of values in a column;
  • “Contact_Measures[Id_Contacts]” The field I want to count the values of.

The next step is to filter out the Not Applicable row of this measure by adding and changing the following code:

Aux_Nr_Contacts:=COUNTAX(                 

FILTER(             Contact_Measures;         

Contact_Measures[Contact_Code]<>”N/A”                  );                

 Contact_Measures[Id_Contact]                 )

Let me explain the previous formula

  • “Aux_Nr_Contacts:” The name of the calculated field;
  • “=COUNTAX( )” Function that counts the number of values for each row according to a given expression;
  • “FILTER()” Fuction that returns a Table filtered by an expression;
  • “Contact_Measures;” Table I want to filter;
  • “Contact_Measures[Contact_Code]<>”N/A”” Filter expression I want to use;
  • “Contact_Measures[Id_Contacts]” The field I want to count the values of, from the FILTER() result table;

With this newly created Measure we can already make some analysis by clicking the top left corner Excel Icon.

The picture below gives us the distribution of created contacts by year since 2008:

But if we use the Compensation_Creation_Date to check the distribution by Compensation Creation Year for all the contacts I get the following result: So we clearly have a problem and we first need to understand why it is happening before we can address it. Back to the model diagram (image below), in green we have the connections that are possible to use when filtering the Measure we created. Notice the red X mark that represents the connection that isn’t working right now for our analysis. The Measure we have created can be filtered by any field from the Contact table, Contact_Creation_User table and also the Contact_Creation_Date table.

The problem starts when we try and filter by, for example, a field from the Participation table. Why is that? In the tabular model, relations are propagated from the one side to the many side (1-N) thus we can’t use filter fields from tables that are, from the Measure perspective, on the N side. In the image below it’s shown, represented by the yellow line, the connections that needed to be made in order to filter our Measure by a field from Participation table.

It seems that we have reached a dead end because, from the All_Measures to the Contact Table there is a N-1 connection to be made. Well…Remember the emphasis I gave to context? Let’s bring it back into play. Relations are usually propagated from 1-N because normally we are using the row context, so it is better said that row context filters only propagate from the one side to the many side. What about the filter context?

Filter context can be propagated from N-1 and this is the trick that will enable us to use any field from any table to filter our initial query result. How can I turn a row context into a filter context? We can use the Calculate() function to do just that. We encapsulate our already created Measure in a Calculate function and the hard work is done.

We will be able to use all the fields from tables that are joined 1-N with the All_Measures. The Tables that are positive for that premise are the Process, Participation and Compensation.

Our new formula:

Aux_Nr_Contacts:=CALCULATE(                 

COUNTAX(                     

FILTER(                         

Contact_Measures;                         

Contact_Measures[Contact_Code]<>”N/A”                          );                 

Contact_Measures[Id_Contact]                      );                 

SUMMARIZE(                     

ALL_Measures;Contact[Id_Contact]                  )                 )

Let me explain the previous formula

  • “Aux_Nr_Contacts:” The name of the calculated field;
  • “CALCULATE()”Function that will turn row context into filter context over a given table;
  • “=COUNTAX( )” Function that counts the number of values for each row according to a given expression;
  • “FILTER()” Fuction that returns a Table filtered by an expression;
    • “Contact_Measures;” Table I want to filter;
    • “Contact_Measures[Contact_Code]<>”N/A”” Filter expression I want to use;
  • “Contact_Measures[Id_Contacts]” The field I want to count the values of, from the FILTER() result table;
  • “SUMMARIZE()” Function that works like a Group By returning a table used by the CALCULATE function to leap the filter context over;
    • “ALL_Measures;” The table to apply the group by;
    • “Contact[Id_Contact]” The name of the table and column I want to group by;

The new measure includes the SUMMARIZE function that works wonders in terms of performance and replaces the following code:

FILTER(

ALL_Measures;

CALCULATE(

COUNTA(Contact[Id_Contact])>0

)

)

Now that we have our new Measure let’s take a look at which new dimension I am able to use:

The previous image demonstrates that now we are able to explore our data using any dimension table linked directly with the ALL_Measures table. Ok… so now I can use the Participation, Compensation or the Process tables, what about Compensation_Creation_Date?

The above image shows that we still have a problem if we want to use the Compensation_Creation_Date Dimention. I am going to show the path of the filter to our Measure using the model diagram and the Participation_Measures related tables as an example.

So there is still a connection that is giving us troubles. The strategy to overcome this red X mark is the same as before. We need to pass the row context into a filter context in order to bring our selection through the Partcipation table.

Here is the Formula:

Aux_Nr_Contacts:=CALCULATE(

CALCULATE (

COUNTAX(

FILTER(Contact_Measures;

Contact_Measures[Contact_Code]<>”N/A”);

Contact_Measures[Contact_Code]

);

SUMMARIZE(

ALL_Measures;Contact[Id_Contact]

)

);

SUMMARIZE(

Participation_Measures;Participation[Id_Participation]

)

)

By adding the second SUMMARIZE over the Participation_Measures we are enabling the exploration of our data by the Dimension tables that are linked to this table such as Participation_Creation_Date and Participation_Creation_User. Now we have the following picture:

And this is corroborated by and Excel pivot table:

So now we know how to enable our Aux_Nr_Contacts measure to be affected by any table in our model, simply by adding more functions to our existing formula.

The final formula looks like this:

Aux_Nr_Contacts:=CALCULATE(             

CALCULATE (

CALCULATE (

CALCULATE (

COUNTAX(

                                    FILTER(Contact_Measures;                                            Contact_Measures[Contact_Code]<>”N/A”

);

                                     Contact_Measures[Contact_Code]                                             );                             SUMMARIZE(                                  ALL_Measures;Contact[Id_Contact]                                  )

);

                             SUMMARIZE(                                  Process_Measures;Process[Id_Process]                          )

);

                     SUMMARIZE(                             Participation_Measures;Participation[Id_Participation]                 )

);

             SUMMARIZE(                  Compensation_Measures;Compensation[Id_Compensation]              )              )

To finish this tutorial let me just make a summary of all the steps we have taken to reach this point.

  1. We began by importing tables to our model from a relational database.
  2. We then proceeded to establish the right connections between tables.
  3. We continued by creating a measure, using DAX, to explore our model in excel.
  4. We learnt about how important context is in DAX.
  5. We created a more complex formula with Calculate and Summarize functions.
  6. We finished creating a Measure that can be filtered by any table in our model.

This is the end of the two series post of how to develop a Many-To-Many solution using two different technologies that have unique pros and cons and should be applied each scenario accordingly.