3 July 2019

Power BI Dataflows

What is it?

As the data volume is growing, the challenge of keeping the data well-formed during all the ETL process also grows. The entire process is critical since all the data needs to be in the best condition possible in order to be consulted, analysed and reported. There are a lot of processes where the loss of data compliance, cost increase with new data sources or cost increase amending data connections can happen.
power BI Dataflows

As a response to all these problems it is presented a tool of data preparation that is included in Microsoft Power BI (Dataflows) allowing to mitigate a big part of the problems with diverse data sources as we are going to show in the next point.

 

What is it for?

Dataflows allows Organizations data ingestion from different data sources, making easier all the modulation process through automated orchestration of transformations. Due to its power, aside from ETL process, this tool it is also used to self-service data warehousing, automated refresh rates, data ingestion from different cloud-based data sources, such as, Dynamics 365, Salesforce, Azure SQL Database, Excel, SharePoint, as well as, on-premises data sources, using gateways to import data and maintain compliance with older technologies, for example SSAS Cubes.

Every modulation process is facilitated through Power Query UI, however, M is the programming language that is implicit in the definition of all entities. In practice, an entity is a table with an associated formula and the dataflow is what takes the table from the data source to ADLS (Gen2) after a series of orchestration processes; in case of a premium subscription, the entities can relate between themselves in the same workspace or between workspaces.

The big advantage of using Dataflows is having only one organizational data source where data can be prepared and later reuse it in different analytical apps of the organization. When entities connect between dataflows, it is also possible to reuse entities that were already ingested, cleansed and transformed by other dataflows without necessity of keeping the data, making more efficient resources management, avoiding data duplication.

 

Power BI Pro Vs. Premium?

Despite all the features in Power BI Pro, Power BI Premium also disposes of: incremental updates, which makes much more efficient the ETL process, specially for big volumes of data, since it doesn’t need to bring all the data again, just the differences; computed entities, which reduces all the load of orchestrating multiple processes of data preparation, which means, this entities referrer another entity inside another dataflow, enabling the establishment of relations between dataflows. In addition to all features already described, follows a comparison, in the table below, between Power BI Pro and Premium subscriptions.

 

Dataflow Capability Pro Premium
Connectivity All connectors to all sources All connectors to all sources
Storage 10GB per user 100TB for P1 or greater nodes
Data ingestion Serial ingestion of entities, making data refresh longer Parallel ingestion of entities
Incremental updates Not available Available
References to entities in the same workspace Not available Available,

allowing the creation of complex data prep processes using multiple dataflows

References to entities across workspaces Not available Available,

allowing full data consistency across the whole data estate

Calculation engine Not available,

since entities cannot refer to other entities, computed entities cannot be created

Available,

allowing computed entities for complex data prep projects with multiple cleansing and enrichment steps

Refresh rates Up to 8 times a day Up to 48 times a day

 

Due to parallel ingestion of data, superior storage capacity in ADLS (Gen2), the use of linked and computed entities, Power BI Premium becomes a much better option for corporative usage. In case the data volume does not justify Premium subscription, Power BI Pro is the most suitable solution for all the ETL process, since the storage capacity is smaller and per user, maintaining all data source connectors and a sufficient refresh rate for smaller data volumes.

 

Requirements
  • Power BI Pro or Power BI Premium subscription

 

    Paulo Alpoim BI4ALL
        João Feneja         
Associate Consultant
Blog