I’ll try to explain one possible approach to a common (and not-so-unusual problem) we may encounter when maintaining Data compression in a Data Warehouse solution environment. Data compression provides multiple benefits saving disk space and improving performance in many situations. However, as it may require more CPU resources, it’s important to decide how to apply (and if it is worth it) compression for each individual database object. As there are already tons of posts on this subject, I will just focus on a specific detail.
In a typical BI architecture, we have a Staging and DW databases (MS SQL Server 2012) to which we need to develop a data compression solution approach. For the sake of simplicity, we decided, after a series of tests, to apply ROW and PAGE compression to the Staging and DW databases respectively. Having a considerable large volume of data to compress (and keep compressed) (~1TB), we managed to write a piece of code to execute via a Stored Procedure and perhaps use it in a scheduled maintenance job.
It accepts the following input parameters:
- Compression Type
- NONE, PAGE, ROW
- Defines the minimum number of rows a certain table must have in order to be a compression target candidate.
- Defines the minimum size a table must have (in MB) in order to be a compression target candidate.
- Offers the possibility to print the statements instead of actually apply them.
- Enables the filter by SCHEMA.
- Defines the maximum duration of the execute statements. Useful when compressing a very large database for the first time. Enables you to divide the process in several batches.
Here’s an example of the output on the “Adventure Works DW 2012”:
The output window shows the progress of the execution, the executed statement and the previous compression type (“Before was NONE”). Additionally, the procedure skips already compressed objects. Thus, executing it again just outputs the following:
So far so good. Although, after a few days we noticed some indexes compression type being reset to NONE. This was due to several “Rebuild Index” SQL Tasks in daily ETL processes. These were the typical:
No problem with this statement…unless it is applied to a disabled index. As stated in Microsoft Documentation:
REBUILD [ WITH (
Specifies the index will be rebuilt using the same columns, index type, uniqueness attribute, and sort order. This clause is equivalent to DBCC DBREINDEX. REBUILD enables a disabled index. Rebuilding a clustered index does not rebuild associated nonclustered indexes unless the keyword ALL is specified. If index options are not specified, the existing index option values stored in sys.indexes are applied. For any index option whose value is not stored in sys.indexes, the default indicated in the argument definition of the option applies.
So, from this point we had two options. Either we fix over 100+ SSIS packages (adding the compression type option to the SQL Tasks) or we find a way to execute a rebuild statement that “guesses” the compression type of the subjacent table. Since the first option offers the risk of having to do it all over again if someone decides to change the compression type of the tables, we went to the second.
Here’s a possible solution to rebuild and index using the subjacent table compression type.
Thus, replacing the SQL Task statements in all of the existing SSIS packages with the execution of this Stored Procedure, we manage to keep all the indexes (nonclustered) compressed in the right way.