Synapse Analytics (Part 1) – An Introduction to a powerful analytical service

azure-synapse-introduction
The explosion in the volume and variety of data available for modern businesses provides a new challenge on how to use it. Business users expect to track trends and generate reports as quickly and efficiently as possible. There are several solutions on the market, and choosing the right one for your business is always tricky. Azure Synapse Analytics is a solution to help solve the business analytics problem. This article provides an overview of this service and is the first of many on this topic.

What is Azure Synapse Analytics?

Azure Synapse is a limitless analytics service that combines enterprise Data Warehousing and Big Data Analytics. It merges various capabilities for a data solution like data engineering, machine learning and BI without the need to create Silos within the business.

synapse-analytics
Figure 1 – Azure Synapse Analytics Architecture by Microsoft

Components of Azure Synapse Analytics

Azure Synapse Analytics combines multiple underlying services, as described in Figure 1. This is covered in the following sections.

Synapse SQL

Synapse SQL is a Massively Parallel processing system (MPP). It uses a distributed query system with Transact SQL (T-SQL), enabling data warehousing and data virtualisations with large datasets. It also allows a user to work with streaming and machine-learning scenarios. The benefits of using Synapse SQL are:

  • Allows a user to create dedicated SQL pools that reserve processing power for data stored in the SQL tables.
  • Users can use in-built streaming capabilities from several cloud data sources.
  • Allows a user to run Machine Learning and AI models using SQL statements with the need to create another service. The models can be pre-existing models or built using AutoML.

Apache Spark

Apache Spark is the most popular open-source big data platform used worldwide for data preparation, ingestion, engineering and machine learning. Azure Synapse integrates deeply with this service for its customers. This offering has the benefits of:

  • Simplified resource model that removes the need for managing Spark Clusters, saving time for the developers and the administration team
  • Fast Spark Cluster start-up and autoscaling of Spark Resources under load reduce the requirement of having resources on standby during busy periods.
  • Allows developers to use C# (.NET) expertise within a .NET application. As a result, developers can use their existing skills, reducing training in new technology.

Data Lake

Azure Synapse removes the traditional barrier between SQL and Spark by allowing engineers with SQL and Big Data expertise to work together and create rich ETL pipelines without leaving Azure Synapse Analytics. This removal of the barrier allows for the following:

  • The SQL Engine and Spark clusters can explore and analyse files stored on the data lake directly without the need to load them into a relational store
  • Fast and scalable data processing between the SQL Engine and Spark clusters

Data Integration

Azure Synapse has a built-in data integration and has a similar look and feel to the Azure Data Factory, allowing developers to create rich ETL code without leaving Azure Synapse Analytics

Unified Development Experience

Azure Synapse Analytics provides a single integrated development environment (IDE) for enterprises to build data solutions. Stay tuned for an upcoming post regarding the Azure Synapse Analytics workspace.

Azure Synapse Analytics Costs

Before you deploy Azure Synapse Analytics, you need to understand the costs involved to avoid a bill shock.

Azure Synapse runs on Azure infrastructure and starts accruing costs when you deploy a new service. The following scenarios affect the prices of this service

  • Data Exploration & Data Warehousing
    • Dedicated SQL Pool – You are charged based on the Data Warehouse Units (DWU) and the hours the service is running
    • Storage – The charge is per TB stored
    • Serverless SQL Pool – The charge is per TB processed
  • Apache Spark Pool – You are charged based on the number of running instances and the hours they are running.
  • Data Integration
    • Orchestration Activity Runs – You are charged based on the number of activity runs
    • Data Movement – For copy activities run on the Azure Integration Runtime, you are charged based on the number of Data Integration Units (DIU) used and the execution duration
    • Data Flows vCore Hours – for data flow execution and debugging, you are charged based on the compute type, the number of vCores and the execution duration.

These charges are added together at the end of each billing cycle, with the invoice showing costs per item.

These costs described here do not include additional like Data Lake Storage Gen2.

Also, note that you may still incur costs like Data Lake Storage Gen2 even with Azure Synapse Analytics deleted.

Conclusion

In this post, I introduced the Azure Synapse Analytics service in Microsoft Azure.

This service is a potent tool for organisations looking to build data warehouses from traditional Relational systems like CRM, ERP and other Online Transaction Processing Systems (OLTP) and for those looking to enrich this by using big data and live-streaming systems.

We also briefly covered how the different features available in Azure Synapse SQL are costed.

Next Steps

In an upcoming coming article, I will create an Azure Synapse Analytics service to get you started on the analytics journey. More information is available on Microsoft Learn

If you have any feedback, please drop me a message, and I will respond as soon as possible. Also, please look through my other articles here

Share the Post:
Related Posts

Sign up with your email address to be the first to know about new publications