Registering the Database in Azure Synapse Delta Lake

We have seen the overview of the Delta Lake solution Architecture and the operation on the Delta Lake needs to be done using SPARK. So today, we will discuss on the creation of the SPARK db that will serve as the layers in the Delta Lake.

It is always good to segregate the layers to achieve the Delta Lake Medallion architecture. The layer can be made by using three different SPARK DB, each DB represents the different state of the Delta Lake Architecture.

The Delta Lake in Synapse is created on top of ADLS Gen2.

Creating Layer/DB for Delta Lake in SPARK

There are two ways of creating the Db in the SPARK.

  1. The first option is to directly write the syntax of DB creation like “create database”.
  2. You can create the DB by registering it as schema as well.

So here, the key point is you cannot create a schema like in SQL database. If you create a schema then it gets registered as the Database in the SPARK instead of schema. The SPARK database will be used to differentiate between the tables. Let’s say that you have table in two database with the same name CUSTOMER then it can be differentiated by using by specifying the database inform as schema (staging.customer or curated.customer)

Registering the DB in SPARK.

  1. Creating Database using SQL query.
%%pyspark
spark.sql("CREATE SCHEMA IF NOT EXISTIS staging")

2. The schema gets registered as Database in Delta Lake.

%%pyspark
spark.sql("CREATE DATABASE IF NOT EXISTIS curated")

These SPARK databases can be seen as the Lake Database(Which explains the third point on what Lake Database is in our blog what is Lake Database in Synapse Analytics).

The database gets registered under the warehouse folder. The warehouse folder gets registered when you create a spark pool in the Azure Synapse analytics. This folder doesn’t gets created if you do not have any spark pool associated with the workspace.

I have a workspace syn-analytics-delta and the primary storage container with the name delta and created a spark pool then the folder structure would look like delta>synapse>workspaces>syn-analytics-delta>warehouse. You can see the DB underneath the warehouse folder now.

Here we have seen that the database can be created by registering as schema as well as by passing the database creation syntax. The database sometimes does not appear even after getting created till you create any object in it.

Hope this blog gave you some clarity of the SPARK database creation for Synapse Delta Lake.

Note: You can create a custom schema once the DB is created by using Azure Data Studio or SSMS.

Leave a Reply

Your email address will not be published. Required fields are marked *