1 September 2017

Using Powercenter to load data from files using a control table

Let us imagine that a business requirement asks that data from a certain number of files be loaded at specific dates defined by a business calendar. Although loading the data at specific dates can be done manually, this can be time consuming, especially if other conditions must be met for the data to be loaded, like a certain number of files must be present, certain tasks must be repeated more than once or an email must be sent if a file is missing and/or incorrect.

The purpose of this tutorial is to provide a possible solution to automate the procedure as much as possible using Powercenter and Oracle/PLSQL, considering of course, that there are many of ways of achieving a similar solution using different methods and technologies. The tutorial also provides a simple explanation for each Powercenter task that is used in the process, therefore, giving those with small knowledge of Powercenter hands-on examples.

In this example, the files are placed manually in a directory by cycles and each cycle has a specific date in a total of 13 cycles for a business year. In order to have an overview and control of these cycles and their respective dates, the following table was created and is loaded each business year manually:

 

The table provides information about the cycles but more specifically, it uses a flag called CURRENT_CYCLE_FLG that is used by Powercenter to determine the current Cycle and its respective date. The flag is updated by Powercenter every time the cycle is completed successfully.

Using Powercenter, we created a mapping that is going to load the data from the files to the DWH using the table described above to “map” the current cycle.

The first step of the mapping uses a session to make a date validation:

Sessions are the “extension” of the mappings developed in the Designer. Therefore, it shows all the options available for the source and destination of our data. Inside the session, we used the following query to determine if the current date matches a date interval (as per business requirement) in our date table:

If the query returns results than Powercenter continues, if not than the procedure stops:

It is possible to define expressions in the flow between tasks. These expressions can determine the path of the flow between tasks in the mapping. In this case, the path of the flow is determined by the following expression:

$s_CHECK_CYCLE_DATE.TgtSuccessRows = 1

Next we used Powercenter’s Worklet to do variety of actions regarding each individual file (as per business requirement).

The Worklet task is basically a mapping inside another mapping and it is quite useful for the intended purpose because it allows the storage of a variety of actions, each regarding a file in this case, that transforms each of those actions into one single action. If we were to place everything into one mapping, it would be extremely confusing and hard to maintain in the long run.

 

 

Each individual Worklet does the following actions:

It starts by checking if the file exits in a specified directory trough a command task:

The command task is a very useful tool if one intends to use Shell Scripting to solve business problems.

In this case we created a shell script, that checks if a file exists in a certain directory with a certain name and if it does, the filename is stored in a text file.

filename=”`basename /apps/dw/bin/PATH/INPUT/FILENAME_*.txt`”

#filename=”${path##*/}”

if [ -s “/apps/dw/bin/PATH/INPUT/$filename” ]; then

 echo $filename > /apps/dw/bin/INPUT/FILENAME.txt

fi

Powercenter then advances if the command task succeeds trough the following expression:

$CHECK_FILE.Status=succeeded

For logging purposes of each individual file, a table was created to store rows of files that were found in the directory.

In order to load this table with the filename the following mapping was created:

 

The mapping takes the value from the text file created from the Shell Script as a source and places the filename in an Oracle table along with an entry date for that file.

Please note that this could have also been achieved by using the properties of the Source and choosing to Add Currently Processed Flat File Name Port. In turn, an expression could be used in an Expression Transformation task to change the name of the file as pleased and send it to a table.

Next we used another session that uses an indirect Source filetype and uses the text file previously created.

 

This way, it allows for Powercenter to look in the indicated Source directory for a file with the name indicated in the text file, instead of manually inserting the path and name of the file to process.

In a broad sense, Powercenter does the following to process the files:

  1. Checks if the file exits
  2. Sends the name of the file that it found to a text file
  3. Uses the value inside the text file to define the source and load the data from that same file

If the file or the process have any issues, the procedure stops and sends an email to inform of the current problem. If the file and process run without any issues, the data is loaded to the tables and the text file is deleted. In order to achieve these actions, the expressions used, were the following:

 

If all goes well, Powercenter finishes by updating our dates table through a Session.

This session contains a “dummy” mapping so it can execute a Stored Procedure task

This stored procedure starts by updating the new cycle first, by using a row number to find the next cycle through the old cycle:

 

And finally updates the old cycle by using the max of the row number and subtracting one value in the current cycle.

 

Conclusion

Powercenter can do a variety of tasks that can solve day to day business problems. The given example merely does an introduction to a wide variety of possible solutions, but it does display the advantage of using a “trigger” mechanism, so that a similar process might be developed to adjust to other realities and other business needs but always allowing for an automated process, freeing up the burden of having to do these activities manually.

 

 

 

 

       João Farinha
         Consultant
Blog