24 April 2018

SSIS Excel Parser – Business Intelligence

What is SSIS Excel Parser and what is its use in Business Intelligence?

SSIS Excel Parser Business Intelligence

 

This is a component implemented into SSIS (SQL Server Integration Services) tool and its main objective is extracting data from excel files, from any cell, into a database, in a more normalized way, which later can be extracted using Views.

The idea here is to implement a “Reading Template” (Model) where we inform the component, which excel sheets to handle, and inside each one, which cells should be handled (given by an absolute position: line and column number).

Then, for each of these Template/Models, we can generate Instances of those same Excel Files, with the data extracted into normalized tables.

SSIS Excel Parser Business Intelligence

 

For example, a template file can be modelled like this:

SSIS Excel Parser Business Intelligence

 

By order:

Template – Header

SSIS Excel Parser Business Intelligence

 

TemplateId – primary key for the excel template
Category – category (grouping)
Name – title
ExcelFormatFile – wild-card file format which identifies the excel file to model
Description – description

Sheets4Template – Detail about the sheets to model

SSIS Excel Parser Business Intelligence

 

TemplateSheetId – primary key for the sheet file
FK_Template – foreign key to the excel template
ExcelSheetName – sheet name (as it is written in the source excel file to parse)
SetOrder – excel sheet’s order
Description – description

Cont4Sheets – Detail about the content, the cells to be read by the component; each cell is formed by a line and a column number

SSIS Excel Parser Business Intelligence

 

FK_TemplateSheet – foreign key to excel sheet
DataRegionName – permits grouping of cells to define a data region; it‘s optional
FieldName – field name
Location_RowNr – cell line number
Location_ColumnNr – cell column number
ExpectedType – expected type (helps in error parsing for field values types)
Description – Description

Template Instances

Let us model the SQL tables:

SSIS Excel Parser Business Intelligence

 

By order:

- Instance key
- Foreign key to Template
- Instance name (for example, it would be the excel file name; we must remember, by now, that several instances share the same template model)
- ReferenceDate (data reference date)
- ExcelFileName (excel file name)
- InsertionDate (data insertion date)
- Is_LastEntry (permits historic extractions; when TRUE, it marks the last record extracted)

SSIS Excel Parser Business Intelligence

 

By order:

- Cell primary key instance content
- Foreign key to Instance
- Template field
- Value (string format)

To model the template:

SSIS Excel Parser Business Intelligence

 

The following View was created to help the Reading of a Model/Template.

SSIS Excel Parser Business Intelligence

 

Then, the instances are saved into these tables:

SSIS Excel Parser Business Intelligence

 

The following View was created to help the Reading of Data Instances:

SSIS Excel Parser Business Intelligence

 

Ok, let us try it with an example!

SSIS Excel Parser Business Intelligence

 

 

First, the Model

Let us use the excel data file below:

Note: the file name has the year (data reference date) and the instance name (key); in this case, 2014 will be the year and 018 represents the Instance name.

SSIS Excel Parser Business Intelligence

 

To help integrate the data into the configuration database, we use an excel file as the one below:

This auxiliary file generates TSQL commands for inserting data into the tables defined above.

SSIS Excel Parser Business Intelligence

 

Then we must create the SSIS package (SQL Integration Services)

We must implement by each Model/Template a SSIS Package (this can be optimized in the future).

In our example, we could build the following package:

SSIS Excel Parser Business Intelligence

 

SSIS Excel Parser Business Intelligence

 

We use a script task written in C# to read the excel data file into the database.

The package has an internal variable, which defines the Model/Template to handle (key):

SSIS Excel Parser Business Intelligence

 

And what if a new file must be modelled? What should we do?

It is easy:

1- Identify in that new excel file:

a. The sheets to model
b. And inside each sheet, which cells to import

2- Use the auxiliary excel file to define this new model and import into the database using the TSQL insert commands

3- Create the new package, which will handle this new Model/Template (copy paste the old one and change the internal variable. Remember: this can be optimized in the future in a single package, if needed)

That’s it.

SSIS Excel Parser Business Intelligence

 

Blog