Recently we were given the task to optimize an ETL batch load in order to reduce its execution time and fulfil the ETL load window.
First of all, we analysed package execution logs in order to identify the ones with longer execution times and therefore, better candidates to be optimized.After choosing our target, a package that loads a fact table and that was taking more than expected considering the volume of loaded data, we started debugging to identify the main bottleneck - source, transformation or destination - and easily spotted that it was the source - it was taking 85% of total package execution time.We started analysing the query and, despite of its complexity with plenty of subqueries, joins, apply operators, it was easy to identify one possible reason for lack of performance - the query was using a scalar function.
Despite of enabling code encapsulation and easy reuse (common generic good practices in programming languages), scalar functions generally result in very poor performance and the main reasons are:
Weak plan optimization once that for Plan Optimizer, a scalar function is like a black box, so it can use some wrong assumptions (bad cost estimate, normally very low) which leads to bad execution plans;
Scalar functions are executed in a separate context, so the overhead of calling and execute the scalar, leveraged by the fact that this is done once per row, dramatically increases execution time;
Using scalar function disable any kind of parallelism for the entire execution plan.
In order to maintain the advantages of code encapsulation (reinforced by the fact that this scalar is used in other contexts), and increase the performance, we implemented a table valued function with the same logic/results as the original scalar function to substitute it. There are two type of table valued functions:
Multi-Statement (MTVF): first declare a table variable, then define a BEGIN/END block where the function logic populates the table variable and finally return that variable;
Inline (ITVF): returns the result of a SELECT statement, with no code blocks/variables.
Besides the syntax differences, these two types of table valued functions have meaningful performance differences, where IVTF have advantage - the main reasons are that MTVF is also a kind of black box to Plan Optimizer leading to bad plans, and also partially inhibit query parallelism.
Therefore, we chose to implement the ITVF in order to leverage query optimization. After implementing the ITVF, we started evaluating and comparing both queries in order to realize if we had any performance enhancement, and, first of all, we looked to query execution plans.
Analysing the query execution plans and the relative cost to the batch of each, the first insight we got out of it was that the ITVF would deteriorate the query more than the scalar function - 26% scalar, 74% ITVF.
But as referred before, the SQL Plan Optimizer can make wrong cost assumptions when dealing with scalar functions so we decided to further evaluate to confirm this insight and decided to analyse query execution time using time statistics (an additional note: due to the same reason of scalar being like a black box, IO statistics are not reliable, once that SQL doesn't know which tables the function accesses and can lead to misleading conclusions). Additionally, we also could check the parallelism operator appeared in the ITVF execution plan (query 2) - contrary to the scalar, ITVF allows the use of parallelism for the entire execution plan. In cases with a large set of rows, and where there is plenty of process capacity (CPU cores) available to be used, this can be a truly game changer.
Relatively to execution times, statistics time has two parts:
CPU time – time used by CPU resources to complete a task;
Elapsed time – total time took by a task from the start to its end, including costs like SSMS rendering, network overhead, I/O operations.
Once elapsed time includes costs not directly related to query performance, it needs to be used carefully in performance analyses in order to avoid misleading conclusions – it can be different for same query on same server during different execution times because it depends on other resources availability. To reduce rendering costs, you can use in SSMS, "Results to Text" and then choose "Discard results after query executes".
However, in order to enable comparing query performance when parallelism is used it is necessary to evaluate both statistics time metrics. If there is more than one processor executing the query in parallel, despite it can take more total CPU time, the elapsed time can be shorter.In this case, due to the scalar logic complexity, transform it in an ITVF was quite complicated, leading to a complex ITVF, and together with the parallelism management overhead, the CPU time increased by 4x.
However, due to parallelism the elapsed time had reduced 50% - with the replacement of the scalar function by an ITVF function, we reduced 50% of the query source which was responsible for 85% of package execution time, and therefore, we reduced nearly 40% of package execution time.
The main conclusion is, whenever performance is important and you are dealing with a scalar or a MTVF function, you shall consider to replace it for an ITVF function. Always remember, best practices are recommendations, you must always adapt them to what you observe and fit it to your own situation.