13 March 2018

How to build a light Report Subscription Platform using SSIS? – Business Intelligence

SSIS tool… SQL Server Integration Services… is an excellent tool to (1) extract data from disparate sources client, (2) transform the data into the business standards business team wants and then (3) load that same data into production area so that the business users can access it from several ways like analytical reports, dashboards, etc.

However, does the SSIS work end here?

 

No, it does not.

The data, now, is available for reporting consume and we have several users that want to execute those same reports or dashboards. But if we have all users executing at the same time those same reports this can lead to performance problems due to this same heavy concurrency. And more: we want to record somehow the request they made, with the input data they define for that same execution and have also some kind of control here, when executing these same requests.

So how can this be done with SSIS tool?

“Easy”: With the help of SSIS, we can build a RSP “Report Subscription Platform”!

 

 

However, we can be more ambitious…

The primary objective is to build this RSP Platform, yes, but we can extend further its functionality and allow this platform built into SSIS to execute more actions than simple executing reports.

Why not?

Let us build a platform where requests are being executed asynchronously!

 

So, this platform involves executing requests – created by users (for example: “I want to generate a Sales Report, so I create a Request, fill it with useful Input Data and send it to a specialized Queue to be executed by a specialized Service”) in a asynchronously way: the client doesn’t have to wait for its execution, he can look at the current state for its request and he can still execute other tasks besides this one.

Therefore, in short, it is a System/Platform where:

1-requests are created by the users (for example, the generation of a sales report)
2-the requests are joined together in a Queue (FIFO with priorities assigned)
3-the requests are picked up and executed by the service in an asynchronously way
4-the user knows the request state in all execution phases and accesses the output generated in the end somehow

 

The Request System – Let us use the Microsoft BI Stack where SSIS is included

 

Now, and first, let us define some concepts for this platform:

Request – a request is created by the users and it is inserted in a Queue (see definition below). That same request must be executed by the service in an asynchronously way.

Queue – container for the user requests.

Scheduled Request – a request is executed by the system following a schedule defined by the user, when he created the request. This schedule is always created by Request.

Listener – the system component that listens for new requests in the Queue: when there is a new request to be executed, it retrieves it from the Queue and passes it to the Dispatcher component (see below).

Dispatcher – the system component that executes the requests passed by the Listener component.

Schedule Dispatcher – the system component that executes the scheduled requests.

Plugin/Provider – a piece of software – SSIS package – that is customized for executing a type of request and is configured in this same platform. This means that this platform is extensible.

 

Ok, now, to organize our architecture and implementation let us define the Functional and Non-Functional Requisites:

From the Client side:

FR01 – the system must implement a web service for the client applications that permits:

i) The viewing of all the requests or filtered by minimum date, created by a user. The information to return must involve the main aspects of a request like the current state, creation date, the status flag execution, etc.
ii) The creation of requests in the system to be executed asynchronously.
iii) The creation of scheduled requests.
iv) The cancelling, when possible, of requests inserted in the system.

Note: the system must attend only the information associated to the calling user: this means that user A cannot see information associated to user B.

From the Server side:

FR02 – the requests must have as main information associated:

- Priority (the requests with higher priority are attended first)
- Creation Date (in the queue)
- State (it is important so it can be possible to know in any instant what is going on with the request)
- Execution flag (success or not)
- Input data for the operation to be executed
- Who created (the user)

FR03 – the system must permit the execution of any kind of a request, it should be built in a generic way (extensibility). For example, it can process requests for executing data reports and it can process requests for processing OLAP cubes. It should work with a “plug in methodology”: if there is a new type of request, so it is necessary to build a plug-in (or Provider, the meaning is the same) for it and configure it in the system, with minimal actions required.

FR04 – in the context of functional requisite FR03, the system must have a plug in for reporting execution with printing: it should be used the SQL Reporting infrastructure to execute and render those same data reports. This plug in must interact with that same reporting infrastructure.
FR05 – the system must permit the scheduling of operations. However, it must start always with a request for scheduling. The scheduling can be for one time only, daily, weekly or monthly, once time or from x to x minutes.
FR06 – the system must permit retries after an error. These values must be configured.
FR07 – the system must control the number of request executions and it should be configured.

In addition, as Nonfunctional Requisites, we can define:

NFR01 – the system must permit load balancing: two or more services to process the requests in the shared queue.
NFR02 – the system must be robust: if one server node fails, the other can attend the execution.
NFR03 – the system must have a pressure control system: if there is too much pressure (cpu execution) in one node, it should control it reducing, for example, the number of requests to be executed.
NFR04 – the system must execute the requests with minimum delay associated, so that the performance can be accepted (depending on the current pressure conditions and available dispatchers).

Now, let us represent and define a possible architecture.

 

How to design/prepare the architecture?

The five components, represented from 1 to 5, in the Architecture image, are:

1 – Request Manager System Databases
2 – Microsoft SQL Server Integration Services
3 – Microsoft SQL Server Reporting Services
4 – File System (local or shared like a file server)
5 – Printing System

The first two points are the most important:

1 – Request Manager System Databases
2 – Microsoft SQL Server Integration Services

Together, they represent the Request Manager System Core: the heart of the system!

The others (from three to five) represent a practical example of using this system.

1 - Request Manager System Databases

 

 

We can model the Request object in this way (data table Queue - RequestManagerQueue):

 

(From top to bottom)

- Key (identifies the request in the queue; each user creates a request and it is returned to him its Key; this key is used to read the current state)
- CreateDate (request was created when?)
- ModifiedDate (request was modified when?)
- CreatedBy (who created the request?)
- RequestManagerStateID (request current state: it is very important to know in which state the execution is)
- StateObs (observations related to the current state)
- SucessFlag (the request was executed with success or error?)
- RetriesCount (in case of an error, a request can still be executed until it reaches success or not)
- Priority (the priority of the request in the queue – the requests with higher priority are executed first!)
- Retriable (the request, in case of error, can be retried? By retrying, we mean resetting its state to the first initial one…)
- OutputChannelType (identifies which Provider is going to execute this type of request; represents the Plugin that is responsible to execute this type of request; we can have several (custom) Plugins configured and installed)
- OutputChannelDetail (the input data to pass to the Provider; there is no standard format to structure this data, it can be represented as we want, free form text: only the Provider associated to it can interpret and decode this input data)
- ServiceExecutionID, ServiceInfo (Reserved – identifies the service)
- AppExecutionContext (it is like a cache for the request, to save state, during its execution, very useful for the system Plugins to save control state during the execution of the request)
- ControlHeartBeat (Reserved – indicates if the execution of this request is still going on or not; useful for unexpected crashes – Queue Management)
- ForScheduling (indicates if the request is scheduled or not, flag)
- ForSchedulingCommand (the scheduling information)
- CancelFlag (indicates if the request must be cancelled, flag)

 

The request priorities are:

Table – RequestManagerPriority

RequestManagerPriorityID – key

Description – the description related

Values:

1-Low
2-Normal
3-Hight
4-VIP

The possible states a request can have:

Table – RequestManagerState

 

RequestManagerStateID – state key
NameState – default representation of name for the state
Description – default description for the state
Flag_Listener_Can_Catch – identifies if this state can be pooled from the queue (as initial states…)
Flag_Dispatcher_Is_Busy – identifies if this state represents a “running state”
Flag_Scheduler_Dispatcher_Is_Busy – identifies if this state represents a “running state” but for scheduling requests only
NameStateForClient – user-friendly name (for final users)
DescriptionForClient – user-friendly description (for final users)

The default states, reserved ones, always present, are:

1 – In the Queue
2 – Ready for Dispatching
3 – Dispatched: Initiating workflow WK
99 – Executed
100 - Scheduling Request
101 - Scheduled. Waiting SQL Agent call
102 - Schedule waked and is running
999 - Request was cancelled.

 

We want to make this system generic so that any kind of request can be executed and not only requests for executing reports or dashboards.

This platform can execute any type of request, assuming that the right Provider is configured and installed to be used by this system.

So, to configure these Plugins:

Table – RequestManagerOutputChannelType

RequestManagerOutputChannelTypeID – key

SSIS_PackageProvider – Plugin name configured and installed in the system (must be the exact name of the SQL Server Integration Package SSIS that handles this type of request)

Description – description related

The default one (reserved) is 9999, which is an internal “ping alive”.

In this example, the Provider with key set to 1, is associated to the Component with name “RequestManager_PlugInProvider_SSRS”. It is responsible to generate reports, in some rendering mode (excel, pdf, etc.).

This name must be correct so the request can be passed to it. If it does not exist, it will throw an error!

Other important objects as well:

Request Profiles

Each request is associated to a Profile.

Table – RequestManagerProfileName

RequestManagerProfileNameID – key
RequestManagerProfileName – the profile name
SetPriorityTo – default value for setting the request priority
SetRetriableTo – default value for setting the request retriable flag
SetOutputChannelTypeTo - default value for setting the request Provider (see the section Plugins, above)

In this example, we have a Profile with name “Reporting Operational” related to the execution of data reports, associated to the PlugIn/Provider “RequestManager_PlugInProvider_SSRS” (key 1)

This means that when a request with this same profile is created, the following is done:

- The request priority is set to 1
- The request retrial flag is set to 1
- The “Provider Name” is set to “RequestManager_PlugInProvider_SSRS”

A request must always have a Profile associated!

 

We can set also the Security for the Profile:

Table – RequestManagerProfileNamePermissions

RequestManagerProfileNameID – key
User – user that can create requests in the system
Description – description

In this example, the only users that can create requests for Profile “RequestManager_PlugInProvider_SSRS” are “bi4all\pmesquita” and “tapnet\90012077”.

And what about logging?

 

Table – RequestManagerLog

The log is inserted in this table.

Configuration Values

Table – SSIS Configurations

 

All the configuration values are registered in this table, the system and Plugins values.

Others must be registered in the future right here.

 

2 – Microsoft SQL Server Integration Services

 

The following system components

a) Listener
b) Dispatcher
c) Dispatcher Plugin/Provider
d) Schedule Dispatcher

 

All the components are implemented via technology SQL Server Integration Services.

a) Listener (RequestManager_Listener.dtsx)

This component, the main action that it does, is to pool the Queue, where the requests are being recorded, read them in a special order and pass them to the Dispatcher component so that they can be executed.

The other action that it does – in parallel with pooling the queue for requests – is to do some management tasks like:

- Escalating the priority for requests that are in the queue for a long time
- Recovering to final state “crashed requests” (with unexpected errors)
- Archiving requests in a historic table
- Controlling the pressure in the system (for example, as an immediate reaction, he can reduce the available number of dispatchers so the pressure can be released; in an implementation phase)
- etc.

However, how are they implemented in SSIS?

Figure – The Listener Component “RequestManager_Listener.dtsx”

Subcomponent marked 1

This is the real Listener: it listens, it pools the Queue for requests that are ready for execution.

Workflow:

1-reads the Requests using the Stored Procedure named “RequestManager_GetRequestsFromQueueForDispatch” (this important SP has all the logic needed to read the requests by a special order)
2-launches the Dispatcher component asynchronously for each request
3-sleeps n seconds before next pool cycle (to avoid CPU pressure)

Subcomponent marked 2 (Queue Manager)

This is the subcomponent that manages the Queue.

Workflow:

1-Executes the Stored Procedure named “RequestManager_ManageQueue”
2-(still in implementation phase) Calculates the current pressure in the system
3-sleeps n seconds before the next cycle (to avoid CPU pressure)

Important note: The component is prepared for unexpected errors: if one occurs, then, the Listener enters in a “retrying from error” state until it can again execute its normal actions. This is implemented in this same SSIS package.

 

b) Dispatcher

This component receives, from the Listener component, a request to execute.

Figure – The Dispatcher component “RequestManager_Dispatcher.dtsx”

Subcomponent marked 1

For the current request to execute, it gets the Provider/Plugin name (SSIS package name) using the field “Channel Output Type” and calls that component passing the ID for the request.

This call is synchronous: when the control is returned it means that the request was executed by the Provider.

Important note: it is prepared for unexpected errors: if there is one, the state is marked to error and finishes execution.

 

 

Subcomponent marked 2

Runs in parallel with the subcomponent marked 1.

It is only responsible for stating that this Dispatcher is still alive and is executing the request.

Algorithm:

It sets, in a loop, the current date system. If there is a crash or for some reason the Dispatcher that is executing this request goes down, this “ping alive” stops. Then, the Queue Manager, in the Listener component, after a while, catches this “crashed request” because there is not any recent “ping” from it and marks it as an error, final state. This is very important so that all Dispatchers can be available for future requests, fulfilling the responsiveness of the system.

 

c) Dispatcher Plugin/Provider

This is the “pluggable” Provider to install and configure in the system.

It is called from here:

 

The SIS package representing it is saved in a special SSIS catalog folder, separated from the Listener and Dispatcher SSIS components.

Important notes:

- We must be sure that the Provider is “bug free” so that we can assure quality in the system

The Request must be executed ASAP so that the Dispatcher can be free for all requests; if there is a delayed action to execute, maybe it is more intelligent (suggestion!) to create a “state machine”, playing with the Queue. In this way, the other requests, which are waiting in the queue, can be executed.

 

Ok, the server side is defined.

And what about the client side?

How to consume this service?

We must define a Web Service, with the interface below, for example:

Figure – Web Service

This represents the client channel for interacting with this service. It is a WebService (SOAP).

As we can see above, there are four webmethods defined in this webservice:

- CancelRequest (used for cancelling requests created before)
- GetRequestsByUser (used for listing all the requests created by a user)
- InsertRequest (used to insert/create requests in the queue)
- InsertScheduledRequest (used to insert/create scheduled requests in the queue)

This Web Service must be secure.

Authentication:

The web service can be configured (web.config, the web service configuration file) in one of these three security modes:

 

From top:

- Gets the current (Windows) calling username
- The user is passed directly as a parameter
- The user is obtained via a security token; this security token must be associated to a Single Sign On method (to implement).

Authorization:

As we know already, there is always a profile for the current user. That profile defines the requests that he can create.

Integrity:

SSL for web service protection.

That is it!

Blog