Тёмный

Advancing Spark - Delta Deletion Vectors 

Advancing Analytics
Подписаться 32 тыс.
Просмотров 3,2 тыс.
50% 1

Whenever we explain how Delta works with parquet, performing redundant copies of "unchanged" data whenever a record is updated or deleted, people are understandably shocked - it's a huge amount of unnecessary work. With Delta Deletion Vectors, we finally have a better answer - deleting records is now a quick, simply metadata operation!
In this video Simon walks through the concept of deletion vectors, looking at how they are implemented and walking through a simple example - following what happens at the file & transaction log level.
To learn more about deletion vectors, check out: docs.databricks.com/en/delta/...
And if you need help on your Data & AI journey, give Advancing Analytics a call!

Опубликовано:

 

15 окт 2023

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 7   
@alexischicoine2072
@alexischicoine2072 Месяц назад
Deletion vectors are amazing. They improve concurrency as well which is detailed on the page about isolation and serialization. If you need to delete data about customers for compliance it’s great. Also if you need to replicate your data to another region you won’t be creating as many extra files that need to be transferred and stored so you can get good savings from that as well. Imagine if you have big gigabyte parquet files in a huge table and you need to delete a record here and there it will make a massive difference.
@riteshsharma344
@riteshsharma344 8 месяцев назад
Thanks for great video as always 🙂
@malebeauty
@malebeauty 3 месяца назад
You're so cool
@SladeFlash
@SladeFlash 7 месяцев назад
Hi, can we set this property in streaming table?
@2307Leito
@2307Leito 8 месяцев назад
Awesome! love your videos! nice feature, quick question, for doing upserts in delta what could be the best way to implement it? let's say you have a fact table by day and on daily runs it loads the 3 closest day to getdate() (it reloads some data and insert new one -upsert-)
@jeanchindeko5477
@jeanchindeko5477 8 месяцев назад
Thanks for this great video. Is this like Merge on Read like in Iceberg and Hudi?
@NeumsFor9
@NeumsFor9 8 месяцев назад
Pretty soon we will be at the old SSAS .deleted store, and all those .store files 😂😂😂....
Далее
Core Databricks: Understand the Hive Metastore
22:12
Просмотров 13 тыс.
Advancing Fabric - Lakehouse vs Warehouse
14:22
Просмотров 22 тыс.
Modifying Delta Tables
20:40
Просмотров 3,8 тыс.
3 PYTHON AUTOMATION PROJECTS FOR BEGINNERS
17:00
Просмотров 1,5 млн
Change Data Feed in Delta
19:53
Просмотров 8 тыс.
Eliminating Shuffles in Delete Update, and Merge
32:01
Advancing Spark - Azure Databricks News October 2023
23:58