Why I started with Codeless-ETL?

I was procrastinating from the start of the year in conducting a presentation at a meetup group after I started organising and hosting the Brisbane AI user group the intention to present increased. One fine day after talking to the fellow organisers I decided to book in a date for my presentation. This post details my presentation, and what I would like to do in the future

Why a topic on Codeless-ETL

The topic I chose for my presentation was Codeless ETL using Mapping Data flows. I wanted to revisit the subject that I discussed with colleagues when working with SQL Server Integration Services (SSIS). Looking at Azure Data Factory, it is a competition between Mapping Data Flows Databricks.

My recommendation for on a greenfields data warehouse project is to use Mapping Data Flows.

Introduction to Azure Data Factory 

The Azure Data Factory (ADF) is a fully managed and serverless integration service that allows a business to integrate all their data. It is a cloud-based ETL (Extract, Transform and Load) tool provided by Microsoft. It is a platform similar to SSIS provided by Microsoft SQL Server.

To test out the functionality provided by Azure Data Factory, you can create these resources yourself or make your Microsoft Azure free account today.

Introduction to Mapping Data Flows in Azure Data Factory

azure-data-factory

The purpose of mapping dataflows is to provide data transformations in Azure Data Factory. They offer a codeless alternative for data engineers to design and develop data transformations. Executing mapping dataflows uses scaled out Apache Spark clusters. Operationalising these dataflows uses existing capabilities provided by Azure Data Factory’s scheduling and Monitoring. Azure Data Factory handles all the code translation, path optimisation and execution of all the dataflow jobs.

How to run the Codeless-ETL demo in your subscription

The source code for the demo is available in the Codeless-ETL repository on Github. This repository contains a few elements to be aware of:

  • Azure ARM templates to build your environment
  • Lab 1 – Builds a simple dataflow to copy data between data source and destination
  • Lab 2 – This is a more complex scenario where I build a solution to complete slowly changing records and write them to a destination

For this demo, my data source is Adventure Works, and this database is on Azure SQLDB.

I have also provisioned a couple of tables with the same table structure as Adventure Works on Azure SQLDB as my destination.

 

Azure services provisioned for the demos

The following resources will be deployed in your subscription:

Name Type Pricing Tier Pricing Info
cdlessadf-suffix Azure Data Factory   https://azure.microsoft.com/en-us/pricing/details/data-factory/
cdlesssuffix Azure Data Lake Storage Gen2   https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/
sql-suffix SQL Database Standard S0 https://azure.microsoft.com/en-au/pricing/details/sql-database/single/


IMPORTANT: When you deploy the lab resources in your own subscription you are responsible for the charges related to the use of the services provisioned. If you don’t want any extra charges associated with the lab resources you should delete the lab resource group and all resources in it.

Conclusion

This presentation was a success where I presented a topic to the broader community. There were several lessons that I learned during my presentation that I need to improve in the future. Some of the improvements I need to make are around introducing my demo’s better and having a more robust finish to my presentation.

I am looking forward to presenting more articles in the future and take my career speaking to be an international presenter.

Next Steps

The next step for me is to submit my presentation to other user groups and SQLPass.

I will expand the demo and add additional scenarios that deal with the following components:

  • Building Power BI Reports and Dashboards
  • Create a pipeline using Azure Databricks
  • Implement Cognitive services

I also want to create additional presentations on different topics from Artificial Intelligence to DevOps in Microsoft Azure.

I also want to create additional presentations on different topics on AWS and the Google Cloud Platform (GCP).

Further reading

You can read some of my other articles on my Home Page

 

 

 

Sign up with your email address to be the first to know about new publications