12 January 2018

Why Data Mining (Microsoft Analysis Services) is still a valid solution for Business Predictive Analysis? – 10 years later

Introduction

In the past 20 to 30 years, Business Intelligence solutions have evolved to support technological changes and business needs, concentrating efforts on performance and usability. In the other hand, concepts and methodologies have not changed much until very recently with a booming interest in “Big Data”.

Microsoft released Azure Machine Learning (ML) cloud platform in 2014, which has grown a lot in interest in the last couple of years. They also integrated “R” into their latest SQL Server release (2016) which is a programming language used for statistical computing created back in 1976 by Bell Labs.

Data Mining (Analysis Services) was first released with SQL Server 2005 and even though it’s a very distinct solution from Azure ML, they share a lot in common because they both apply widely used statistical algorithms to solve problems/needs.

If we look at the time-frame of these releases they seem quite spread. It looks like Data Mining (Analysis Services) has been left out (no changes since SQL Server 2008), maybe because it was born too soon. It was released in a time where most companies investing in Business Intelligence were still worried and investing in Data Quality and their Data Warehouses to support descriptive analysis.

One of the main differences between Azure ML and Data Mining (Analysis Services) is that the latest does not operate in a cloud environment and so it might be the only possible solution for businesses that do not want to send their data out of their network. Even though “R” has been integrated into SQL Server 2016, there is still a steep learning curve to apply correctly this statistical language.

Business Case

A business XYZ makes product-selling contracts with other entities/customers (B2B). These contracts have time windows for product liftings, which causes a problem for logistics optimization.

Example: XYZ makes a contract with a customer to sell 1000 units, which can be lifted from XYZ’s warehouse from the 1st of the month until the 31st of the month. The customer can decide to pick up Totally or Partially the contract amount and when they want to do it between the window start and end. XYZ’s profit margin is affected by having the warehouse full or empty at a given time so it is crucial to have a better understanding on customer behaviour.

Several algorithms were applied to a cleaned contracts dataset coming from XYZ’s Data Warehouse creating and testing a model on Visual Studio / Data Tools before being deployed to Analysis Services.

An algorithm(s) might be chosen by Type (Classification, Regression, Segmentation, Association, Sequence analysis) or by Task like in this case “Predicting a discrete attribute”.

Analysis Services let you cross validate and test algorithms for optimizing models so for this type of Task the following algorithms were used: Decision Tree – Clustering – Neural Network

 

 

 

Cross validation gave both Decision Tree and Neural Network a higher score for predicting “Total” or “Partial” liftings for a specific Entity/Customer.

Analysis Services models can then be explored through Excel giving insights to analysts:

 

The analyst will learn for example by looking at the decision tree that a specific Entity/Customer usually (91.13% of the times) lifts a specific product X Totally and not Partially. These types of insights will help on predicting customer behaviour for future contracts and logistics optimization.

Analysis Services also allows direct querying on the mining model and those outputs can then be joined with other datasets on XYZ’s Data Warehouse. In this case, current Contracts reporting on the Data Warehouse would have an extra-predicted column with “Total” or “Partial” values and their predicted probability before the actual liftings happen for a recent contract made with that customer.

 

Conclusion

Data Mining (Analysis Services) hasn’t been changed since SQL Server 2008 but that doesn’t make it obsolete, on the contrary, it is still a viable and good solution for the most common predictive analysis a business needs. Also being part of Microsoft’s Business Intelligence stack will improve integration with businesses that already use this set of tools for their data exploration. A no-brainer choice for “anti-cloud” businesses.

.
.
.
.
.
.
.
   André Correia
Senior Consultant
Blog