14 May 2018

Talend Open Studio for MDM – Installation and Use Case

1.1. INTRODUCTION

Most of the companies that work with (Big) Data face a real problem when it comes to different sources and types of information to ingest to their systems. One of the main concerns is related to unstructured data, normally provided in worksheets by Business Users, relational source systems without proper data management rules, among others. Treating this master data with Excel is almost impracticable. Despite the fact that this issue also happens in a traditional BI model, the focus of this article is to show it from a Big Data point of view, also addressing Talend Open Studio for MDM. The greatest change that MDM brings to the Big Data ecosystem is the possibility to integrate the outputs created by MDM tables into HDFS, allowing to delete and update rows easily and cleanly. The other big opportunity is that is possible to keep track of changes made by the users to the MDM tables.

Some of the previously mentioned data is mostly classified as master data. It can refer to multiple core business entities such as Customers, Suppliers, Employees, Products, Assets, etc.

Master Data Management systems were created to help companies manage and consolidate the type of information described above. In general, they should meet some key requirements such as:

• Definition and maintenance of metadata for master data entities in a repository
• Acquire, clean, remove duplicates and integrate master data into a central master data store
• Offer a common set of shared master data services for applications, processes and portals to invoke access and maintain master data entities i.e. system of entry (SOE) MDM services
• Manage master data hierarchies including a history of hierarchy changes and hierarchy versions
• Manage the synchronization of changes to master data to all operational and analytical systems that use complete sets or subsets of this data

The MDM systems described in this article, are being embraced more and more by Organizations, to control their master data and improve business performance. These companies realize that, without MDM solutions, their master information is more prone to have duplicated and fractured data across multiple operational systems and stored in more than one system. This situation leads to difficulties to understand which data is the source of true and if/how the data gets synchronized across systems.

Important vendors such as DataFlux, IBM, Talend, Informatica, and Sypherlink are betting on this type of tools. Some tools available in the market nowadays are:

• Hyperion MDM
• IBM WebSphere Product Center and Customer Center
• Kalido 8M
• Oracle Customer and PIM data hubs and Sunopsis AIP
• SAP NetWeaver MDM
• Talend Open Studio for MDM / Talend MDM Platform

On the rest of the article, we will focus on the installation and present a real Use Case for master data management using Talend Open Studio MDM tool and Talend Web User Interface.

1.2. TALEND OPEN STUDIO FOR MDM (INSTALLATION)

It is important to add that, regarding the installation manual, there is a lot of dispersed information that is currently not aggregated in the same place on the internet, which makes this manual very relevant.

Talend has two different MDM tools available:

1) Talend Open Studio for MDM – free and open source tool developed by Talend with a lot of interesting features such as:

• Design and productivity tools: Eclipse-based developer tooling and job designer, export and execute standalone jobs in runtime environments, embedded data validations, and business rules, automatic data integration with MDM models;
• MDM Web Application: Master data repository, fully functional MDM environment, complete Web UI for master data management, model-driven user interface;
• Connectors: Cloud – Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform; RDBMS – Oracle, Teradata, Microsoft SQL Server; SaaS – Marketo, Salesforce, NetSuite; Packaged Apps – SAP, Microsoft Dynamics, Sugar CRM; Technologies – Dropbox, Box, SMTP, FTP/SFTP, LDAP; Web services: SOAP, REST/HTTP;
• Components: Control and orchestrate data flows and data integration with master jobs; basic matching and grouping of entity; map, aggregate, sort, enrich and merge data;

2) Talend Master Data Management Platform – under a subscription licensed mode. It has all the tools available on Talend Open Studio for MDM plus a few more:

• Data quality and Governance: data profiling and analytics with graphical charts and drill-down data, automate data quality error resolution and enforce rules, data masking;
• Data preparation and Stewardship: import, export and combine data from Excel or CSV file, export to Tableau, self-service on-demand access to sanctioned datasets;
• Master data management: Visual modelling and import/export of data models, integrated workflows for data stewardship and governance; MDM query language to consume REST data access; Master data full text search and ad-hoc queries, impact analysis, audit trail and dependency enforcement; MDM activity monitoring dashboard, Multiple and recursive hierarchies’ management; Role-based security and Active Directory integration;
• Advanced Data Profiling: Fraud pattern detection using Benford Law, column set analysis, advanced matching analysis, time column correlation analysis.

On this article, we will explain the installation of Talend Open Studio for MDM version and Talend MDM Server.

The mentioned version is available on Talend website:

https://www.talend.com/products/mdm/mdm-open-studio/

talend open studio mdm

 

1.1.1. STEPS TO INSTALL TALEND OPEN STUDIO FOR MDM

The download has two files. You will need to unpack the ZIP file to a specific location on your PC or server:

talend open studio mdm

 

The downloaded version is TOS_MDM_Studio 6.4.1 which we unzipped it into our main C: drive:

talend open studio mdm

 

1.1.2. STEPS TO INSTALL TALEND MDM SERVER

1) When you run the exe file you will be prompted with the warning of the Java Platform, allow it (Talend is Java based and it is needed to run Tomcat application server):

talend open studio mdm

 

2) Click OK to select the installation language:

talend open studio mdm

 

3) Click next to start Talend MDM Server 6.4.1 installation:

talend open studio mdm

 

4) Click Next to accept the terms of the license agreement:

talend open studio mdm

 

5) Click Next after you read the information regarding Java and MIT Licence:

talend open studio mdm

 

6) Select the packs to install:

talend open studio mdm

 

This step is important because, here, you decide if you want to install both Talend MDM application and Apache Tomcat Server. If you already have Apache Tomcat installed on your server/machine, you will not need to install it again. In this case, since we installed Talend MDM on our local machine, we installed the Apache Tomcat for MDM Server as well.

6) Select the installation path for the MDM Server:

talend open studio mdm

 

8) Define the port for the MDM Server service:

talend open studio mdm

 

9) Select the database type (H2 Embedded is the only option available):

talend open studio mdm

 

10) Define the username and password to access the database:

talend open studio mdm

 

Example: (password:talend)

11) Define the database index directory:

talend open studio mdm

 

12) Finish the installation agreeing with the installation packs and path:

talend open studio mdm

 

Important: for the MDM Server to work, we need to guarantee the variable JAVA_HOME is pointing to the correct location of the Java Runtime Environment installation as shown below:

talend open studio mdm

 

talend open studio mdm

 

To start the MDM server, right click on the catalina.bat file and Run with Elevated Privileges:

talend open studio mdm

 

After the server starts with success you will receive the message of Server startup as the image above.

TALEND OPEN STUDIO FOR MDM – USE CASE

On the following Use Case, we will show an example of the ingestion of a manual table via an excel file.

First, it is important to define the key terms used on Talend Open Studio for MDM and Talend MDM Web User Interface. We will focus on the most important, which are explored below:

1) Data Container: holds data of one or several business entities. Data containers are typically used to separate master data domains.
2) Data Model: defines the attributes, user access rights and relationships of entities mastered by the MDM Hub. The data model is the central component of Talend MDM and maps to a single entity that can be explicitly defined.

a. Entity: describes the actual data, its nature, its structure and its relationships. A data model can have multiple entities.
b. Record: an instance of data defined by a data model in the MDM Hub. For example, two records that are considered similar, or a close match, may be merged.

3) View: a complete or subset view of a record. A complete view shows all elements or columns in an entity, while a subset view shows some of the elements or columns of an entity. A view may restrict access to attributes of a record depending on who or what is asking for the data.

For this Use Case, we defined a Data Model, Data Container and a View with the name of the ingestion table jde812_m_route_code.

1)  Data Container

 

talend open studio mdm

 

2)  Data Model

talend open studio mdm

 

Inside the Data Model, we defined an Entity called jde812_m_route_code with several Business Elements to be aligned with the data ingested from the excel file, including:

r) jde812_m_route_code_id (key)
s) cod_source_type
t) load_dttm
u) update_dttm
v) load_user
w) update_user
x) cod_load_type
y) cod_sector
z) cod_distribution
aa) cod_route
bb) cod_urgent_type
cc) cod_entry_cut_off
dd) cod_ship_cut_off
ee) cod_delivery_cut_off
ff) cod_delivery_cut_off_days
gg) data_effective
hh) data_end

3)  View

talend open studio mdm

As previously mentioned, when you define a view you can select which Business Elements will be visible on the Talend MDM Web User Interface. For this Use Case, we maintained all the business objects visible.

DEPLOYING OBJECTS TO THE MDM SERVER

After the creation of the objects on Studio, we need to deploy them into the Talend MDM Server. Below are the needed steps to publish studio objects into MDM server.

1) Setup the connection from Studio to the server:

talend open studio mdm

 

This is done on the Server Explorer tab existent on the bottom part of studio interface.

2) Publish the objects into Talend MDM Server:

talend open studio mdm

 

 

 

TALEND MDM WEB USER INTERFACE

After we publish the objects created on Talend Studio, we can import the model on Talend MDM Web User Interface.

First, we log into the Web User Interface:

talend open studio mdm

 

The three most important views on the Web User Interface are:

1) Welcome: this is the default page when you enter the Web UI

talend open studio mdm

 

Important: on the right side of this view, we can already see the Data Container and Data Model uploaded to the server.

2) Master Data Browser: on this view we can see, delete and update all the records that belong to each Entity. We are also able to import and export records for the selected view (explained below).

talend open studio mdm

Note 1 – The image above is blurred since it’s based on real data

IMPORT AN EXCEL FILE INTO TALEND MDM WEB USER INTERFACE

The process to import an excel file, after the creation of the objects on Talend Studio, is very simple and described below:

a) On Master Data Browser, select Import

talend open studio mdm

b) Browse the file to import:

talend open studio mdm

 

c) Click Submit after selecting the file:

talend open studio mdm

 

Success message is displayed

talend open studio mdm

 

Data is now available on the Web User Interface, and end users can create new records, update or delete existing ones:

talend open studio mdm

Note 2 – The image above is blurred since it’s based on real data

3) Journal: on this view, we can track all the changes applied to a specific Data Model and/or Entity, filtered by date, operation type, source or key:

talend open studio mdm

Note 3 – The image above is blurred since it’s based on real data

POTENTIALITIES / ADVANTAGES USING TALEND MDM

(Master) Data is one of several pillars Organizations stand on to achieve success. Exploring tools that give them more control over their data is, or should be, on the priorities of every company. Our experience, working on a project with one of the major pharmaceutical companies in the world, is that the better we control the dispersed information that source management reports or dashboards, more chances we have to obtain better insights over the information provided. Talend MDM is one tool available on the market that provides this type of control, in several ways such as:

1) Security: having the data centralized with Talend MDM tools, allow us to have one central repository with controlled data;
2) Change logs: Talend MDM Web User Interface has a journal that records every change in data;
3) Maintenance: Talend MDM Web User Interface allows to maintain data – create, update and delete – directly on the web interface, being a more controlled environment to apply data changes;
4) Import/Export: Talend MDM Web User Interface has several connectors allowing the import and export of information with multiple applications.

talend open studio mdm

 

 

 

 

talend open studio mdm

Blog