28 November 2018

The power of Advanced Analysis

Information as support to decision making process

The use of information (data) to support decisions, is as natural as the act of breathing to the human being. We use information (data) in most decisions of our life: to cross the street we validate if the pedestrian light is green; to set the speed of our car we consider the speed limits presented in the traffic signs; to decide which subway line we will take, we consult the subway map; and we could continue with more trivial examples as these.

If we look at the universe of organizations, companies, public institutions, associations or even governments, data/information is the vital blood that keeps them alive and cohesive. The most basic and simple processes that make an organization alive and consolidated are data/information dependent.

The story goes that Genghis Khan was the greatest ruler of his time, being lord of an empire that stretched from Asia to Eastern Europe. To maintain an empire of such magnitude united, with multiple ethnicities, religions, people and languages, one powerful army was not enough. The unity of an empire depended heavily on the ability to transmit information/data (e.g. messages, documents) quickly over long distances. Genghis Khan had a vast communication network (Yam), which consisted of a courier service based on a well-organized station network scattered throughout the territory of the empire. This network allowed the diffusion of information to the whole empire at a speed never seen at the time. Maintaining an empire of this size consolidated was not possible without information/data.

Manage what you can measure

What data about an organization is really relevant to its management, life cycle and cohesion?

In the universe of management, it is often said that you can only manage or evaluate what you can measure. In the context of an organization, data that allows its monitoring (measurement) and management assumes its most elementary form through KPI’s and Metrics. These KPI’s and Metrics are supplemented with context information (dimensions, analysis perspectives, time, cost, products, warehouses, etc.) to allow a better understanding and gathering of insights. These KPIs and Metrics are grouped and analysed in dashboards. The analogy between a Dashboard and an airplane dashboard is commonly used, which allows you to measure the state of the airplane/organization and the direction in which it is traveling.

Traditionally, in the Business Intelligence universe, the analysis of these indicators (KPI’s and metrics) is done using historical data, allowing a historical analysis of the organization, and an analysis of the current state of the organization. Although historical data is the raw material of advanced analysis, we can consider that the traditional analysis focus on the empirical analysis of historical (past) data.

The industrialization of data analysis techniques

In the last decade technological and scientific transformation allowed the industrialization of advanced techniques of data mining and data processing that only existed in the academic world (e.g. Deep Learning, Support Vector Machines, GLM, Random Forests, among others).

The processors capacity evolution, the increase in data storage and the algorithms of data mining, made possible, to the common mortal, the use of these advanced techniques often requiring a simple personal computer. Nowadays a smartphone has more processing power and storage capacity than many servers in the 90’s used in Data Mining (it is enough to evaluate how much it cost 1GB of disk in the early 90s).

The techniques that are now known as Data Science are the techniques that led to a new revolution in the way data is analysed throughout the last decade.

It is often said that data is the oil of the future, because the systematic application of these techniques allow one to detect insights, patterns or deeper relationships between information, make predictions or generate recommendations, among others. In the background they allow to discover hidden knowledge in the data, that would be impossible to detect using an empirical process and that can represent precious information to an organization, generating new business opportunities, reduction of costs, knowledge of the clients, etc.

These advanced techniques allow to industrialize and automate the process of data analysis and knowledge discovery within organizations, which was traditionally done empirically and manually (the well-known ad-hoc analysis is the best example of this).

Another fundamental element that has been observed in recent years, and which is also a sign of change, is the increase of the scientific culture within organizations in the processes of data analysis. The use of statistics as an initial tool in the exploratory analysis of data is already common.

What has been mentioned contextualizes the changes that have been observed in recent years, which led to the materialization of the concept of Advanced Analytics.

Then what distinguishes a traditional analysis of an advanced analysis?

Traditional Analysis

These analyses are based on traditional BI tools that analyse historical data. It is a type of analysis of the past, based on the empirical principle that by analysing and perceiving the past, I can make better decisions for the future, starting from the principle that there are patterns observed in the past that will repeat themselves in the future (which is not necessarily true). These analysis processes are usually manual, and in their most advanced form based on analysis and exploration of data ad-hoc.

It is possible to systematize the Traditional Analyses in the following variants:

  • Reports and Dashboards – Analyse the past. Perception of what happened in the past or what happens now in the organization
  • Ad-Hoc Analysis/Multidimensional Exploration – Exploratory Analysis of empirical data, trying to detail information/indicators already available in reports/dashboards. Detail perception (drill down, drill thru) contextualizing the indicators under analysis (where it happens, when it happens)
  • Alerts – Automatic notifications of the occurrence of certain events (e.g. KPI values outside acceptable limits, stock breaks, etc.)

Advanced Analyses

Advanced analyses are based on a more scientific approach. Statistical-based techniques (e.g. hypothesis testing), data mining algorithms, optimization algorithms, and scenario analysis. This type of analysis is based on scientific processes since the analytical process of the historical data is done in a massive and automatic way, using a machine (execution of the algorithm and not the manual inspection of the data as in the traditional analyses). Detection of patterns in historical data becomes an automatic, quick and inexpensive task, using scientific processes (e.g. hypothesis testing) to validate the probability of a pattern repeating itself in the future (or if a pattern observed in a sample of a population is to be reflected in the whole population).

Advanced Analyses include, for example, the following techniques:

  • Exploratory Analysis and Statistical Inference
  • Forecasting and analysis of scenarios
  • Data, text and Web Mining
  • Semantic Analysis
  • Sentiment Analysis
  • Advanced visualization (visual data mining)
  • Analysis of Social Networks (analysis of networks / graphs)
  • Predictive analysis

Innovation in the use of these techniques arises from the possibility of creating analytical models that somehow allow to “predict” or anticipate the future, as opposed to more traditional analyses that “look” at the past.

We cannot think of Advanced Analysis as something independent or opposed to traditional analysis. The two complement each other. For example, analysing clusters (data mining) can lead to the creation of customer segmentation, which will be presented in the traditional Client Analysis Dashboard. Much of the hidden knowledge in data that is discovered by these techniques, often leads to the creation of new indicators (KPIs) that are included and monitored in the traditional reports and dashboards of an organization. We must see these techniques as automatic processes of discovery of knowledge.

David Ferreira