30 June 2015

Use PowerQuery to Monitor IBM Cognos Transformer Cubes Partition Size

Issue

For organizations that use IBM Cognos Transformer Cubes to present data, you may have seen this error in the past:

(TR1901) PDS-PPE-0197 The local data cache size exceeded the allowed maximum of 2147418112 bytes.

This happens because the size of the cube reached the 2GB limit. As stated in the "Practical Guide for Transformed in Production" (see references at the end), It is becoming more and more common for companies dealing with enormous amounts of data to be forced to provide the end users with larger dimensional views of their data. In doing this it is possible to exceed the 2GB limit imposed on single file cubes

The latest Transformer versions (6.6 or above) allow a cube to be split in different files (one .mdc and multiple .mdps), allowing them to have more than 2GB. The way the tool chooses the number of partitions that will be created is by using the "multifilecubethreshold" (MFT) property.

As mentioned in the "Practical Guide for Transformed in Production", Transformer determines the number of output files needed, taking the number of data records in the cube, dividing by the threshold, and rounding up.

Unfortunately IBM Cognos does not have an automatic tool to manage the size of these .mdp files. When one cube partition reaches the 2GB limit mentioned above, it fails. There are solutions like giving a lower MFT value, that avoid possible failure with the growth of the data, but this may lead to longer cube build times. We need to find a solution that avoids failures and also improves the cube build performance.

Solution

Algorithm

There are some rules that we can use so that the cubes partitions don't reach this limit. One possible solution is to check the size of the generated .mdp files and see if the biggest one is near the 2GB limit. The below algorithm defines a possible workflow to decide if we need to change the MFT:

This algorithm defines an interval between 1GB and 1.7GB for which we don't need to any change in the MFT value. Any size outside this interval will need an adjustment in the MFT value. There is a definition for the lower limit too, because if the biggest .mdp file is lower than 1GB, we may have space to improve the performance.

Implemmentation

One fast and easy way to implemment this solution, is to use Microsoft PowerQuery and its functionalities.

Microsoft PowerQuery is a free Microsoft Excel Add-in that allow us, between several features, to define queries that will extract data from databases, folders and/or other systems. To download the add-in you can go to this website:

http://www.microsoft.com/en-us/download/details.aspx?id=39379&CorrelationId=ea276871-76e2-4df2-ba64-a52b1b0b6843.

You need at least the Microsoft Excel 2010 Professional Plus and the Internet Explorer needs to be at least in version 9.

The idea for the below explanation is not give the full description of the needed steps to implemment this solution but give some guidelines for it.

If you open Excel, after the installation of the Add-in, you will see a new Tab (Power Query).

This will open a new window where we can define the files source:

In the case we have more than 1 cube and they are separated between multiple folders, we can choose the top folder and then apply filters in the retrieved data, as this will give all the files in the top folder and subfolders (The file names and folders have been omitted).

With this information, we need to do some transformations:

We need to clean all files and folders that we don't need and keep only the files with the .mdp extension. If you look at the right pane, you will see that most of the steps are for that purpose. In this case, there were a lot of backup folders that needed to be removed.

Once the list is cleaned, we will need the file size of each .mdp. For that, it's necessary to expand the attributes column and select the size. Important Note, the file size is showed in Bytes.

Because the size of the files is in Bytes, we will need to create a new calculated field to convert this to Gigabytes. This will make our rules a lot easier.

Formula

(([File_Size_Bytes]/1024)/1024/1024)

The final step will be the calculation of the maximum .mdp file size per cube model. We will need to use the "Group By" option, and aggregate the data by folder and cube model. We will need to aggregate the file size column with the Max Aggregation Calculation.

Notes

It can be interesting to count the number of .mdp files, just for reference. In that case we need an additional calculation, with the Count.

We would have an output like the below (The file names and folders have been omitted):

With this we can close and load the data into an excel sheet.
The information will show up in the excel sheet (The file names and folders have been omitted):

We can now add two additional columns at the end of the list, that will allow us to create the calculations and alerts defined in the algorithm.

If the biggest .mdp is bigger than 1.7GB, reduce the cube MFT value according to the below rules:

Decrease the MFT value in 50.000 intervals

The minimum possible value for the MFT is 100.000

If the lowest .mdp is lower than 1GB, increase the MFT value according to the below rules:

Increase the MFT value in 100.000 intervals

The maximum possible value MFT value is 3.000.000

Final Comments

This is a quick and easy solution to create a monitor system that can alert for MFT that may have to be changed. It can potentially avoid failures in the overnight failures, if the value is too high or, optimize the cube build time, when the value is too low.

It's important to mention that once we open the above excel file, the connections will be refreshed. And because we are using some defining a top interval in the 1,7GB, we have some time to change the MFT value before the cube fails.

References

"Practical Guide for Transformed in Production"
Microsoft Excel PowerQuery Add-in

.

.

.

.

     Pedro Nunes
        Manager
Solutions Development