What is Delta Table in Synapse Analytics?

We have seen the solution architecture of the delta lake in our Blog and how we can register a delta table in the Delta Lake. However, this blog will go around the working of the Delta table from the time you register and perform the DML operation on it.

We will see internal architecture of the Delta table in Synapse Analytics.

Delta table is stored as a file in the Storage account. The Delta table data is stored as the parquet file in the location (Azure Data Lake Storage Gen2) depending on whether you have registered the table as Internal or External table. For the difference between Internal and External table please refer to the previous blog https://techdiw.com/spark-tables-in-delta-lake/.

The moment we register a delta table (either Internal or External), it creates a folder with the table name which we have provided. Under the table name folder, it creates a folder named _delta_log and this folder captures the transaction on the table in the combination of JSON and parquet file. In synapse, it doesn’t create .crc file. This Transaction log is actually referred as Deltalogs.

Let’s understand each of the file which gets created, the moment we create the Delta Table. The number of data file will increase depending upon the transaction we performed.

Insert:

In this case, it adds the new parquet data file to the folder and for each of the transaction it creates a JSON file in the delta log file.

Update:

When we update the file then it adds a data file in the folder and a JSON transaction log file which tells the delta engine to refer the new file that got added and don’t use the previous version of the data.

Delete:

When a Delete operation is performed on the table then it doesn’t add or delete any data file from the folder rather it adds a JSON file that points the record in any file is deleted and not to be considered while fetching the record.

In the above screenshot, the table name is material and underneath this folder we can see one folder (_delta_log) which captures the transaction log of the operation performed on table and the Data files is saved as parquet file with the snappy compression by default.

Let’s peek at the transaction log folder.

The Transaction log file contains log of the action in the form of JSON and parquet file. The JSON file naming starts with leading 20 zeroes and it keeps incrementing depending on the actions performed on the table. For example, it will start from 0000000000000000000.json and will increase like 0000000000000000001.json and so on.

Since for each transaction on the table it will have a JSON file, it will grow tremendously over time so after every 10 JSON file there is 1 parquet file created and that will have the records of the last 10 JSON files that is called as the checkpoint file for e.g., 0000000000000000010.parquet.

If you think in the case of Update and Delete, the files are not being removed. That is because the Delta table supports the Time Travel and if the data files get deleted the moment, we fire the query then you cannot go back to the previous version of data as the data file itself is not present.

Depending on your project needs, we can change the file retention period, or you can perform the Vacuum or Optimize commands to optimize your tables. Hope this blog gave you some insights about the working of the Delta tables in Azure Synapse Analytics and helped you understand the working of Delta tables.

Category Collection

What is Delta Table in Synapse Analytics?

Leave a Reply Cancel reply

Category Collection

Leave a Reply Cancel reply

Related News

Query Optimization In Synapse Dedicated SQL Pool

Automating Workflows: Triggering Logic Apps from Synapse with Python