Extracting data into a centralized repository needs to be performed with the least impact on source systems as possible. In some cases, developers are giving a time frame to work with and extract all the data they need directly from source systems. In other cases, there is a team responsible for the development of these extraction processes. The output is normally delivered in the form of comma separated value files, which are later consumed by other processes.
In a recent project where I have been working on, apart from the development of extraction processes from the source system, the responsible team also developed an Intermediate Database (IDB) to where the output of the extraction was loaded into. While it is common practice to use Event-Wait task, which waits for a file, and kick starts a worklet/session when the file arrives, in the scenario presented, what triggers the session is a change in the IDB. The given example shows the approach taken to solve this issue.
Mapping design phase
Before going further into the workflow one needs to design a mapping in which we will query the IDB for changes. We will need to design a Source, in this case, a table in a SQL Server database connection, a Target, which will be a file, and a Filter Transformation.
The IDB has a Process table, which will be our source. This table has a couple of columns and rows, but we are only interested in the RUN and the CHANGE_DATE columns. We query the table to check only for these two columns.
The destination file will only be populated from the output of the later query, if the CHANGE_DATE column (aliased to Col2) has the same date as the date of the extraction and if RUN (aliased Col1) equals 0, meaning that the process has finished and the data is ready to be extracted.
For that, we add to our mapping a Filter transformation as shown in the picture below.
The purpose of this transformation is to load the output file only if both col1 and col2 have the value 0, otherwise, the file stays empty.
Workflow design phase
In Workflow Manager, besides all other workflows that were previously designed to extract the data from IDB, two more workflows need to be developed. The monitoring and the trigger workflow. The picture below shows the Monitoring workflow. This workflow is scheduled to run every minute.
The first session in this workflow runs the mapping designed earlier, you can see that an expression was built that verifies if any rows were loaded into the target file. The next tasks will only execute if the latest expression evaluates to true, otherwise it will keep running until changes have been made in the IDB.
This workflow is divided into multiple workflow executions. In order to execute these workflows, we used the task command to execute command lines to start the other workflows.
The command used is “pmcmd scheduleworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name”.
The last task in this workflow is the Control task, which is used to stop the workflow, otherwise, after this execution, the workflow would continue to execute.
Since we stopped the Monitoring workflow, staying in an unscheduled state, we need to re-schedule the workflow to run in the next morning, and for that we have designed the Trigger workflow , which has the purpose to re-schedule the Monitoring workflow.