SPARK Tables in Delta Lake

We saw how to create a database in the synapse SPARK pool in our previous blog. Today we are going to understand the types of Delta table in SPARK.

There are mainly two types of SPARK tables.

Managed (Spark Internal table)
Unmanaged (SPARK External table)

There is slight difference while creating these tables, in a nutshell “Managed” table does not need any Location to be specified while creating the table whereas the “Unmanaged” SPARK tables do need a location to be specified during the table creation.

Let’s check the difference between them.

	SPARK Managed table	SPARK Unmanaged Table
Table Creation	No need to specify the location	Location is needed while table creation
Management	SPARK manages the table metadata and the data in the file stores.	Spark manages only the table Metadata. User needs to manage the data files in the file stores.
Drop Operation	It drops the table metadata as well as the data files of the table.	It drops the metadata of the table but doesn’t delete the data files underneath.
Truncate operation	Supported	Not supported
Naming convention	Also known as Internal Table	Also known as External Table

URL to point to the ADLS file location : abfss://<Container name>@<storage name>.dfs.core.windows.net/<folderpath/filename>

Creating SPARK Managed table.

I am using the SPARK syntax to create the delta table. You can use SQL syntax as well to create delta / Parquet tables.

%%pyspark
#reading the file in ADLS Gen 2 into a dataframe
df = spark.read.load('abfss://delta@stgdeltademo.dfs.core.windows.net/TEST/customer.parquet', format='parquet')

#Writing the dataframe as table to the Delta Lake
df.write.format('delta').\
partitionBy('year').mode('overwrite').saveAsTable('landing.customer')

If you notice in the above script, we have not specified any path as an option while reading the File into the DataFrame and writing as table (.saveAsTable(‘landing.customer’). It will get registered to the warehouse/landing.db.

You can verify the type of table and other information about the table by using the below query.

%%sql
DESCRIBE EXTENDED landing.customer

Creating Unmanaged table.

I am using the SPARK syntax to create the delta table. You can use SQL syntax as well to create delta / Parquet tables by specifying the LOCATION clause.

%%pyspark
#reading the file in ADLS Gen 2 into a dataframe
df = spark.read.load('abfss://delta@stgdeltademo.dfs.core.windows.net/TEST/customer.parquet', format='parquet')

#Writing the dataframe as table to the Delta Lake
df.write.format('delta').\
options (‘path’, 'abfss://delta@stgdeltademo.dfs.core.windows.net/Delta_tables/).\
partitionBy('year').mode('overwrite').saveAsTable('landing.customer')

You can verify the type of table and other information about the table by using the below query.

%%sql
DESCRIBE EXTENDED landing.customer

Category Collection

SPARK Tables in Delta Lake

Leave a Reply Cancel reply

Category Collection

Leave a Reply Cancel reply

Related News

Query Optimization In Synapse Dedicated SQL Pool

Automating Workflows: Triggering Logic Apps from Synapse with Python