Nowadays, we must only give guidelines to computers and let them do the best job possible.
In the case of ETL processes within a data warehouse, these have their own times and execution orders, so we just have to set these parameters and allow a more comprehensive process do the job.
The implementation we made for this new Orchestrator has in consideration all of these premises. To do this just set up what to run and how to run, that is, dependencies that the processes have from each other, and the whole process is managed by the machine. With precedence of processes it knows what can be executed every time and by checking the latest execution times of processes, adapts the time that they run, so you can get maximum machine performance.
We managed to do all this only with a process developed in DTSx.
Some challenges have been overcome with the use of some settings. Here are some examples:
- Pass variables to the process;
- Indicate what to do in case of error (we can set it to return success, even if has error);
- Creating the concept of sequence parallelized. Allows for a series of processes to run in sequence, but in parallel with other processes;
The process has an intelligence engine that optimizes the execution of daily chains based on their track record of executions. This process attempts to optimize the maximum parallelism of each execution to run all processes in the shortest time.
There is also the adjustment feature of the number of threads used for parallel execution to processes. As the server load (memory for example) decreases, the Orchestrator limits the number of processes running in parallel so there is no server overload.
Below we can see the evolution of a chain runtimes. The chain has more than 600 ETL processes, with extractions from several sources, processing and various types of load information.
Note clearly decreased at the beginning of first execution of the Orchestrator. However, from the moment we run history, the process optimizes the implementation chains and parallels more efficient all the processes, making the whole process faster.
We continue to improve the process to be increasingly efficient. The next steps will be to find the long way to be the first to be executed.