What is SSIS Excel Parser and what is its use in Business Intelligence?
This is a component implemented into SSIS (SQL Server Integration Services) tool and its main objective is extracting data from excel files, from any cell, into a database, in a more normalized way, which later can be extracted using Views.
The idea here is to implement a “Reading Template” (Model) where we inform the component, which excel sheets to handle, and inside each one, which cells should be handled (given by an absolute position: line and column number).
Then, for each of these Template/Models, we can generate Instances of those same Excel Files, with the data extracted into normalized tables.
For example, a template file can be modelled like this:
Template – Header
TemplateId – primary key for the excel template
Category – category (grouping)
Name – title
ExcelFormatFile – wild-card file format which identifies the excel file to model
Description – description
Sheets4Template – Detail about the sheets to model
TemplateSheetId – primary key for the sheet file
FK_Template – foreign key to the excel template
ExcelSheetName – sheet name (as it is written in the source excel file to parse)
SetOrder – excel sheet’s order
Description – description
Cont4Sheets – Detail about the content, the cells to be read by the component; each cell is formed by a line and a column number
FK_TemplateSheet – foreign key to excel sheet
DataRegionName – permits grouping of cells to define a data region; it‘s optional
FieldName – field name
Location_RowNr – cell line number
Location_ColumnNr – cell column number
ExpectedType – expected type (helps in error parsing for field values types)
Description – Description
Let us model the SQL tables:
- Instance key
- Foreign key to Template
- Instance name (for example, it would be the excel file name; we must remember, by now, that several instances share the same template model)
- ReferenceDate (data reference date)
- ExcelFileName (excel file name)
- InsertionDate (data insertion date)
- Is_LastEntry (permits historic extractions; when TRUE, it marks the last record extracted)
- Cell primary key instance content
- Foreign key to Instance
- Template field
- Value (string format)
To model the template:
The following View was created to help the Reading of a Model/Template.
Then, the instances are saved into these tables:
The following View was created to help the Reading of Data Instances:
Ok, let us try it with an example!
First, the Model
Let us use the excel data file below:
Note: the file name has the year (data reference date) and the instance name (key); in this case, 2014 will be the year and 018 represents the Instance name.
To help integrate the data into the configuration database, we use an excel file as the one below:
This auxiliary file generates TSQL commands for inserting data into the tables defined above.
Then we must create the SSIS package (SQL Integration Services)
We must implement by each Model/Template a SSIS Package (this can be optimized in the future).
In our example, we could build the following package:
We use a script task written in C# to read the excel data file into the database.
The package has an internal variable, which defines the Model/Template to handle (key):
And what if a new file must be modelled? What should we do?
It is easy:
1- Identify in that new excel file:
a. The sheets to model
b. And inside each sheet, which cells to import
2- Use the auxiliary excel file to define this new model and import into the database using the TSQL insert commands
3- Create the new package, which will handle this new Model/Template (copy paste the old one and change the internal variable. Remember: this can be optimized in the future in a single package, if needed)