The amount of data stored and available to a business has increased since the year 2000. This increase in data has introduced a gap where companies cannot track their various data sets and create effective governance strategies. These gaps in knowledge have led to problems for various groups looking to provide value to the business. For example,
There are serval software applications and vendors that allow a business to catalogue their data assets to mitigate this knowledge gap. This catalogue software is typically costly and would require companies to buy licences for their employees. This cost acts as a barrier for SME companies to invest in this technology.
Microsoft has a new service in preview that will provide a unified data governance service to help organisations to manage their data assets. Purview helps an organisation create an up-to-date map of the business’s data using automated discovery.
To test out the functionality provided by Azure Data Factory, you can create these resources yourself in your tenancy or make your Microsoft Azure free account today.
To create an instance of Azure purview service, carry out the following steps:
Create a new Azure resource and search for the Azure Purview service as shown
Selecting the Azure Purview service will bring up the following blade that will allow you to create the service and read the documentation
Selecting create will bring up the following blade where you need to provide information to create the service. In this first window you need to provide the resource group that was created earlier and the location for the service. For this demo I am using East US as this was one of locations available at the time of writing.
In the Configuration window, select the platform size and the catalog options
The next option that you need to define is the tags for the service. This information is not mandatory to create the service but might be useful to track costs against a specific use.
Once the tag information has been created, select review and create, check the information you provided for the service and select create.
Once the create option has been clicked, the following page (blade) is shown with the status of creating the service.
Once the service has been created click on the Azure Purview Studio.
Once you launch the Purview Studio you will get the following IDE (Integrated Development Environment).
Let’s start by first creating a collection. To do this select the Sources option on the left-hand side of the studio. On the next screen select new Collection. A collection can be used to group data by business function or environment.
Provide a meaningful connection name and select create
After creating a collection, create some data sources and ingesting the metadata. There are several metadata connectors provided by Microsoft currently, but these will grow over time.
Before we proceed further, an azure credential is required by Azure Purview service and this credential will need access to the relevant data sources. Azure Purview can be configured to use an existing key vault by going to the Management Console in your purview instance.
Once a credential is created, a data source can be created in the studio. Select the data source as shown.
Let’s start by first creating some data sources and ingesting the metadata. There are a number of metadata connectors provided by Microsoft, but these will grow over time.
Provide connection properties with your data source. I am connecting to an Azure SQL DB that is already provisioned and contains the Adventure Works database in this scenario.
Registering a data source in this fashion will add it to the collection.
Start the scan. Starting the scan will create a job, and once the scan is complete, Someone can review the metadata at a later date.
I will list the outcome of the scan in a future post.
The Azure Purview service is charged per VCore-hour of scanning your sources with the price rounded to the closest minute. This price is valid through to the end of February and while this service is in preview. While in preview, scanning of Power BI and SQL Server sources are free. For further information refer to Azure Purview Pricing
The next step is to further explore the service and build and end-end data governance platform and start implementing it at a customer. I am looking forward to the future growth of the product and how it will stack with other productions like Informatica and Collibra.
You can read more of my posts on Blog HomePage
Sign up with your email address to be the first to know about new publications