23 June 2021

Modern Analytics with AWS

Computing and database technologies gave enterprises the means to store, operationalize and analyse data in order to gain great insights on their business, however at the cost of heavy investment on both hardware and dedicated IT teams.

As business changes so must IT resources, and on premises this means slow and risky process, requiring answering – what infrastructure must we provision today to meet our needs tomorrow – and the wrong answer can lead to lacking resources in one hand, or wasted investment on idle resources in the other.

From small companies to global enterprises, AWS gives you the tools to create custom fit environments to host your platform and workload in order to meet your business demands, and as they change, so can the infrastructure that hosts them, ensuring both a high available and cost-efficient platform.

AWS has many services that can aid customers to ingest, store and gain insight from data – following is a framework architecture to address most customers Analytics needs.

FIG 1. AWS Modular Analytics Framework

 

This architecture allows the development of a robust analytics ecosystem that can offer different levels of data integration, from a fully robust Data Lake to a highly performant Datawarehouse.

A governance layer can hold the metadata for the entire data lake and set cell level restrictions on data access for different user profiles in the Consumption Layer.

 

DEALING WITH DATA GROWTH AND THE INCREASING RISK OF HUMAN ERROR

Thanks to its modular approach, this framework can be adapted to meet each use case requirements such as the one described next.

Mota-Engil is a leader in Portugal with a consolidated position in the ranks of the 25 largest European construction groups, with three distinct geographical areas – Europe, Africa and Latin America and activities in Engineering and Construction, Waste Management, Energy, Multiservices, Transport Concessions, Mining and Logistics.

As an international reference in the sectors where it operates, the focus on continuous innovation is evident and the same philosophy is extending to their internal operations, by aiming for more automation, reliability and availability of their analytics workloads. Following is an instance of that modernization.

Local branches and markets spread throughout the world on Mota-Engil’s construction sites enrich operational data sets using MS Excel files, later merged centrally for analytic dashboard analysis.

Initially this solution offered a quick and user-friendly manner to store information and share it. However, over the years as the data grew, so did the manual effort and the risk of human error in merging and maintaining the consolidated data

Each market would send upwards of 600 columned data sets that would need to be appended and transformed, making the process memory intensive, and eventually impossible due to MS Excel limitations.

To modernize this process the goal would be to create a scalable, serverless process that would ingest these excel files, normalize the datasets, append them to the existing data in a central data lake. Ultimately, create business views over the data lake that would be accessible through MS PowerBI and MS Excel.

Using the framework presented at the top (figure 1), we used its modular approach to pick, deploy and adapt the needed components, as shown next (figure 2).

FIG 2. Deployed Framework Components 

 

TRANSFORMING DATA AND AUTOMATING MANUAL PROCESSES

The team previously responsible for the manual effort of joining all the existing MS Excel files now simply uploads them into an S3 Bucket. This triggers an Amazon Lambda automatically that will add a request into the processing queue in Amazon SQS, which will in turn start an instance of the orchestrator in AWS Step Function.

The AWS Step Function execution (figure 3) orchestrates a sequence o AWS Glue jobs that will clean and transform the data coming from that newly ingested file, storing it in higher layers of the Data Lake according to their level refinement and use Amazon SNS to notify he users of the result of the process.

FIG 3. AWS Step Functions Worflow example

 

ACHIEVING A SCALABLE AND MODULAR SOLUTION WHILE REDUCING HUMAN MAINTENANCE

Amazon Athena views reflect the important business analysis and the data lake is partitioned in a way to allow for the least transformation effort every time a new file is uploaded, keeping the views as updated as possible all the time.

PowerBI now connects to the data lake through Amazon Athena, having access to the raw data, the standardized data and business views.

This solution provided Mota-Engil with a completely serverless process, event driven, very scalable, with no maintenance required, a pay per use model with no fixed monthly costs and most important – due to its modular approach – the possibility of latter additions to this ecosystem in their journey of modernization and innovation.

       Hugo Lopes
Specialist Consultant