28 February 2018

Using Data Quality processes to solve problems in source systems

1.1. WHAT IS DATA QUALITY?

For organizations, data is only relevant if it truly represents reality and if it is understood within its specific business context, making it extremely important for Information Technology (IT) departments inside organizations to be very concerned and careful about their data relevance and quality.

A company is able to evaluate its data in two different categories: ability to represent its business reality and its business processes. Therefore, it is possible to decide that a specific registry is not able to represent reality, does not have the correct spelling or formatting or its volumetry is not coherent with the expected result (for instance, with duplicated or missing values).

1.2. PROBLEMS THAT ORIGINATE THE NEED FOR DATA QUALITY – WHAT IS HIGH QUALITY DATA? WHAT ARE AND HOW DO THE MOST COMMON PROBLEMS ARISE?

High quality data reliably represents reality, being properly cleaned: this data does not have duplicated records, has the correct formatting and spelling and adds value in the specific business context in which it is framed.

Nowadays, companies are dealing with increasing amounts of data. As an example, clients and employees insert data in multiple operational systems/CRM based in several files with problems, without any care nor preparation, which gives rise to inconsistencies, duplications and lack of quality in the collected data.

1.3. HOW DOES IT FIT AND WHAT DOES DATA QUALITY DO? HOW CAN WE OBTAIN HIGH QUALITY DATA? WHAT SHOULD WE DO?

Every organization with a business, or data provider companies, has problems related to their data quality. The concept is not directly related to the storage of possibly incorrect data, but by providing solutions to clean organizations data by implementing processes to facilitate the correct formatting and spelling of their data and which are able to increase its data quality.

The data quality processes goal may be to supply correctly formatted and cleaned data to a specific analytics system (or others). For instance, an organization may want to correctly format the data supplied in a specific source file concerning orders and correct its information before the cargo delivery, in order to avoid potential mistakes in the value chain. A different scenario might be a data preparation prior to an analytics system data loading, in order to achieve more realistic, precise and reliable analysis.

In cases where a data policy already exists in the IT infrastructure, whenever a new system or source of information is added, this data policy must be adapted and revised to contemplate the changes, since if the systems communicate with each other, the data errors may propagate and cause mistaken information in pre-existing systems. Therefore, it is not enough to implement data quality processes once, being consequently a continuous task, which should be considered in every data entry or transformation processes of an organization.

There are several applications from different suppliers in the market helping in this task of data quality implementation in an organization. Some examples might be Alteryx, Data Watch and Talend Data Preparation. These tools complement the data transformation processes with more comprehensive analysis, pattern searches and data quality evaluation methods.

                                              Figure 1 - Data quality process

1.4. HOW TO ADD MORE VALUE? CAN DATA QUALITY BE COMPLEMENTED BY OTHER INFORMATION TREATMENT SKILLS?

Data quality can be used in collaboration with other information treatment skills as it is related with data preparation for decision making support, not preventing, however, the application of different transformations or data processing. The complete process of data preparation can be made according to the business needs.

> Master Data Management

When talking about data quality, often arises the need to include data validation skills in order to ensure the quality of the data, being necessary to appeal to Master Data Management (MDM) solutions.

The data quality tools are very powerful concerning data transformations with data transport, being the MDM able to respond to the data consistency and synchronization problems that often arise in a data preparation process.

MDM tools also have the capability of rearranging process hierarchies, contrary to data quality tools. This process dependencies must be contemplated in MDM and evaluated with business needs, in order to enrich the data preparation process and correctly help with quality data in the decision making process (data steward role).

 

                       Figure 2 – Example of data preparation process including Data Quality and MDM

As can be seen in the previous example of Figure 2, after cleaning the data with the implementation of rules for data quality, it is also possible to include in the data preparation process MDM tools in order to verify registries veracity, maintaining a real, physic, data set which is fed by surrounding systems and allows the data updates.

> Big Data

As Big Data is growing exponentially in the IT world, and being Data Quality a data preparation concept, we evaluated its potential in this context.

Considering Data Quality tools are able to be implemented with huge volumes of data, even if from different sources, some considerations must be in place for a successful data preparation process, since this kind of data structure demands for huge data quality management, quality characteristics and quality indicators.

However, it is possible to create dynamic evaluation processes for data quality control in this type of data infrastructure. With the variety of information source systems, users may not necessarily be the data producers, which raises the data quality measurement difficulty. In this scenario, with data infrastructures of this nature, a hierarchical pattern of data quality should be followed from the user perspective, making them the main information quality point of control, completely involving them in the implementation of the data treatment process.

 

                 Figure 3 – Example of a hierarchical pattern of data quality from users perspective.

1.5. CONCLUSION: WHAT ALLOWS DATA QUALITY? WHICH BENEFITS DO WE HAVE IN USING DATA QUALITY PROCESSES?

When we have high quality data, it can easily be processed and analyzed, resulting in insights that help organizations to strive. High quality data is essential to Business Intelligence and other data analysis efforts, as well as to increment the operational efficiency of processes within an organization.

Therefore, and to ensure a high quality of its information, a company should implement its own data quality processes, complemented by subsets of data preparation skills like MDM, in order to obtain more precise and realistic analysis which make the decision making process easier and more sustained.

Thus, it is fundamental that the data preparation processes take into account business knowledge of the organization, with the implementation of a unique vision of the business and its clients inside all company departments, with data treatment processes that respond to the specific department needs and the global vision of the business. As an example, when a Marketing department develops a campaign for a specific new product based in sales statistics of pre-existing products, the Sales/Customer Support department should be able to contribute with analysis of clients’ characteristics and specificities. It should also present the clients cluster of the company, enrichening the already available information with its unique vision of the client and business.

Concluding, as the information assumes an increasingly important part inside businesses and organizations, it is essential to ensure the best possible quality of data, representing the reality as closer and correctly as possible in order to augment the maximum possible value in the decision making process.

Blog