I was procrastinating from the start of the year in conducting a presentation at a meetup group after I started organising and hosting the Brisbane AI user group the intention to present increased. One fine day after talking to the fellow organisers I decided to book in a date for my presentation. This post details my presentation, and what I would like to do in the future
The topic I chose for my presentation was Codeless ETL using Mapping Data flows. I wanted to revisit the subject that I discussed with colleagues when working with SQL Server Integration Services (SSIS). Looking at Azure Data Factory, it is a competition between Mapping Data Flows Databricks.
My recommendation for on a greenfields data warehouse project is to use Mapping Data Flows.
The Azure Data Factory (ADF) is a fully managed and serverless integration service that allows a business to integrate all their data. It is a cloud-based ETL (Extract, Transform and Load) tool provided by Microsoft. It is a platform similar to SSIS provided by Microsoft SQL Server.
To test out the functionality provided by Azure Data Factory, you can create these resources yourself or make your Microsoft Azure free account today.
The purpose of mapping dataflows is to provide data transformations in Azure Data Factory. They offer a codeless alternative for data engineers to design and develop data transformations. Executing mapping dataflows uses scaled out Apache Spark clusters. Operationalising these dataflows uses existing capabilities provided by Azure Data Factory’s scheduling and Monitoring. Azure Data Factory handles all the code translation, path optimisation and execution of all the dataflow jobs.
The source code for the demo is available in the Codeless-ETL repository on Github. This repository contains a few elements to be aware of:
For this demo, my data source is Adventure Works, and this database is on Azure SQLDB.
I have also provisioned a couple of tables with the same table structure as Adventure Works on Azure SQLDB as my destination.
The following resources will be deployed in your subscription:
Name | Type | Pricing Tier | Pricing Info |
---|---|---|---|
cdlessadf-suffix | Azure Data Factory | https://azure.microsoft.com/en-us/pricing/details/data-factory/ | |
cdlesssuffix | Azure Data Lake Storage Gen2 | https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/ | |
sql-suffix | SQL Database | Standard S0 | https://azure.microsoft.com/en-au/pricing/details/sql-database/single/ |
IMPORTANT: When you deploy the lab resources in your own subscription you are responsible for the charges related to the use of the services provisioned. If you don’t want any extra charges associated with the lab resources you should delete the lab resource group and all resources in it.
This presentation was a success where I presented a topic to the broader community. There were several lessons that I learned during my presentation that I need to improve in the future. Some of the improvements I need to make are around introducing my demo’s better and having a more robust finish to my presentation.
I am looking forward to presenting more articles in the future and take my career speaking to be an international presenter.
The next step for me is to submit my presentation to other user groups and SQLPass.
I will expand the demo and add additional scenarios that deal with the following components:
I also want to create additional presentations on different topics from Artificial Intelligence to DevOps in Microsoft Azure.
I also want to create additional presentations on different topics on AWS and the Google Cloud Platform (GCP).
You can read some of my other articles on my Home Page
Sign up with your email address to be the first to know about new publications