21 February 2019

Loading different partitions in the same table simultaneously

When loading data into a table, and if you want to have several processes loading the table simultaneously, you can launch the same process with different input variables. For this, you can have two scenarios, partitioned and unpartitioned tables.

  • If the table you are loading is not partitioned you will not have any issues since you are loading / updating / deleting different records and even when a process retains a lock on the table, another process will wait and continue when the lock is released. Since there are not any schema changes, the process continues without fault.
  • If the table you are loading is partitioned and you are using partition switch to load it, the first process to start the switch will alter the table schema. Even when this first process ends and releases all locks on the table, if another process which was waiting for the lock release tries to start the partition switch the schema of the table was already altered and an error is returned due to the schema change.

In the case you are working with the last case explained above, you can implement it in different ways. You have a process that loads a partitioned table which receives as an input a variable that identifies different partitions.

  • One way to do this is to run the processes for each partition (or set of partitions) sequentially which might cause a huge delay in the ETL process (which most commonly runs daily overnight and needs to be ready and available on regular business hours).
  • Another way is to use system tables to detect if the table in question is locked and set up a waiting strategy. This way one process will only start switching partition when another switch ends.

Implementing the last option, this can be accomplished by adding the following script on the process where the partition switch occurs, before starting the switch:

On the switch script before starting the partition switch process, using a while loop looking at sys.dm_tran_locks we can check if the final table is locked, and counting the number of locks on the table, the switch only starts when the count is zero. While it does not, it waits for 1ms before trying again.

As an example, all needed objects for partitioning were created in a dummy database. Tables used for switching in and out are created at runtime based on the final table structure.

Using the logic above, a first process starts switching partitions on the table which causes a table lock. The other processes will only try to start the switch after the lock for the first process is released.

The main difference here is the timing. Without the forced wait, a second process will read the table schema before the first lock is released, causing the error. With the forced wait, the second process will only read the table schema after the first lock is released and can complete the switch successfully.

 

Scripts used to create the dummy database and all needed objects

BI4ALL_DW_CREATE_DATABASE

BI4ALL_DW_ADD_FILEGROUPS

BI4ALL_DW_CREATE_SCHEMAS

CREATE admin.PARTITION_MAPPING

BI4ALL_DW_CREATE_USP_CREATE_PARTITION_TABLES

BI4ALL_DW_CREATE_USP_SWITCH_PARTITION

BI4ALL_DW_CREATE_PF_FCT_PARTITIONED

BI4ALL_DW_CREATE_PS_FCT_PARTITIONED

BI4ALL_DW_CREATE_TABLE_FCT_PARTITIONED

INSERT DUMMY DATA P1

INSERT DUMMY DATA P2

INSERT DUMMY DATA P3

 

Ana Santana
Consultant
Blog