Introduction to Azure Data Factory V2

May 17, 2018 / 0 Comments

It was only a matter of time till the complete Microsoft on-premise Business Intelligence stack was moved to the Azure Paas service. With the introduction of Azure Data Factory (ADF) back in 2015 and the recent launch of ADF V2, which is in public preview for the moment, it is now possible to orchestrate complex hybrid ETL/ELT operations in the cloud.

This also makes running a complete Business Intelligence platform as a service in Azure possible without having to provision virtual machines. ADF, in a nutshell, is SQL Server Integration Services (SSIS) running in the cloud. ADF V2 is a new and improved version of V1, which supports more data sources and better flexibility. This makes it a great tool for hybrid data integration solutions which Microsoft prides itself on, since they are the only player which has both on-premise and cloud computing offerings.

Azure Data Factory V2 has even made it possible to easily Lift and Shift your existing SSIS packages to the cloud by providing an integration runtime that can be created and linked to an Azure SQL Database for the SSISDB. This is what we are using to deploying our on-premise packages to.
There is a great demo video here which explains this in more detail. The immediate benefits of moving existing SSIS packages to ADF are to:

Reduce operational costs
Increase high availability
Increase scalability

Another great feature of ADF V2 is the ability to directly create data pipelines directly from the Azure portal using the included visual development tool. As shown below, once you create your ADF V2 instance, you can use the Author&Monitor button to launch the development environment.

In addition, in ADF V2, all your data factory pipelines, datasets, connections & triggers can be source controlled with VSTS GIT integration directly from the web interface. It can also be directly exported as ARM templates for easy replication across multiple environments.

This enables DevOps processes of Continuous Integration (CI) / Continuous Delivery (CD), which historically was not possible with data integration projects. This article here explains this.

Another key feature of ADF V2 is the integration of Azure Databricks, which is the advanced analytics toolkit developed for Azure based on the Apache Spark. This makes the running large data operations incredibly fast due to the in-memory computer power and scale of the Spark technology.

Look forward to more posts in this area with demos on some of these features.

Author:
Stefan Outschoorn
Consultant-Project Delivery

Analytics and AI

Dynamics 365 ERP

Pre-built Analytics

Consulting Services at AppSource

Gallery

Contact

Introduction to Azure Data Factory V2

4 Ways That Data Analytics Enable Third-Party Logistics to Stay Competitive

Why a unified data strategy is your best bet during economic uncertainty

Leave a comment Cancel reply

Quick Access

USA

UK

Sri Lanka

Australia

Analytics and AI

Dynamics 365 ERP

Pre-built Analytics

Consulting Services at AppSource

Gallery

Contact

Introduction to Azure Data Factory V2

Power BI for Business Analysts – “Where do I start?”

How to write your SQL query in DAX?

Related Posts

4 Ways That Data Analytics Enable Third-Party Logistics to Stay Competitive

Why a unified data strategy is your best bet during economic uncertainty

Leave a comment Cancel reply