Long Running “Read Data Source” – Facts
This one cube started with the usual schedule (around 7AM), and stopped writing logs when reached the fact “data read”, even though the thread remained active. After requesting 3 “kills” and restarts along 2 days, we decide to let it run. On the day after, I found the below log:
Isolating the Issue
The same usual suspects were checked: no performance issues were found on the database, no problem on the Cognos server (other cube builds were not affected), and no optimization that we threw at it seemed to help (after tuning all that could be tuned, the end result was an 18h read). Once more, the data itself seemed to be playing a big part on the performance. Cutting the data in halves, using the already covered binary search approach, and by reducing the test set, at a certain point we were finally able to isolate something meaningful:
Slice 1 of Data: Open d’source: 21min, Read d’source: 1h11m, 4.5M rows loaded
Slice 2 of Data: Open d’source: 13min, Read d’source: 2h30m, 1.89 M rowsloaded
This proved that the slice 2 had something special about it that created this behaviour. After some more investigation on the data itself, the issue was finally pinned down: Due to a new market being brought into the regional platform, new data was loaded. Because this market did not have customer information, the load forced a -1/Not Valid on the customer ID for the sale. As a consequence, the Slice 2 had a very big amount of data assigned to the infamous “-1/Not Valid” that was covered in Part IIBut why would this create a problem on the fact table load? Did this mean that a dimension value was affecting the load time of a fact table?
The Non-Unique Category Code Effect
Firstly, it is important to understand the Category Code Uniqueness rule and behaviour. Imagine you have the same rows we covered on Part II.
United Kingdom 1/Not Valid1/Not Valid
United Kingdom England1/Not Valid
United Kingdom England London
United Kingdom Scotland 1/Not Valid
The Cognos category uniqueness rule says that an ID must not exist twice in a dimension. If on the same level of the same hierarchy, Cognos will interpret a duplicate as either an error (unique is on, but move is disabled), or as a move (if unique is on, and move too), or will simply ignore if no uniqueness is necessary. However, because no 2 categories may have the same ID, how will Cognos behave when receiving the same ID on different levels of the hierarchy?
The consequence is a set of ~ categories[1] (that have ~###), and on our specific situation, the number of duplicates was considerable (for the reasons already covered). The consequence and final result of all the above: associating fact data with ~ categories takes more time than assigning facts to regular categories. Also, the bigger the maximum ~### value, the worst performance will be observed. Please note this is only relevant on Scope categories (categories that are assigned the data, fact granular level). [1]Find more about category code uniqueness requirements here - https://www.ibm.com/support/knowledgecenter/SSEP7J_8.4.0/com.ibm.swg.im.cognos.ug_cogtr.8.4.0.doc/ug_cogtr_id1554cogtr_member_unique_names.html - and here - https://www.ibm.com/support/knowledgecenter/SSEP7J_8.4.0/com.ibm.swg.im.cognos.ug_cogtr.8.4.0.doc/ug_cogtr_id1600bp_uniqueness.html
.
.
.
.