Delta Lake Deep Dive: Liquid Clustering

Подписаться 2,8 тыс.

Просмотров 6 тыс.

50% 1

Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vítor Teixeira, Senior Data Engineer at Veeva Systems.
Liquid Clustering is Delta Lake's answer to the complex challenges of Big Data. Traditionally, partitioning and Z-Order clustering have been used to improve query performance by managing large datasets effectively. However, these methods come with limitations such as complexity in implementation, rigidity in data layout, and the need for frequent data rewrites. Delta Lake’s Liquid Clustering offers a dynamic solution. It allows for flexible redefinition of clustering keys without the need to rewrite existing data, adapting effortlessly to evolving analytic needs.
This session will cover how Liquid Clustering simplifies data layout decisions and optimizes query performance, marking a significant advancement over traditional partitioning and Z-Order clustering methods. Don’t miss this opportunity to learn about Liquid Clustering and how it can revolutionize your data management strategy.
Quick Links
Join us on Slack: go.delta.io/slack
GitHub: github.com/del...
Join Google Groups: groups.google....

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 7

@alexischicoine2072 5 месяцев назад

Very interesting. For zordering you can store the columns in table properties at table creation and then retrieve them when optimizing it's not that much code.

@luisriveros1119 9 месяцев назад

Hi !! I have a question is it possible to implementing liquid clustering for DataFrames directly saved to delta files (df.write.format("delta").save("path")), The conventional approach involving table creation

@alexischicoine2072 5 месяцев назад

It's a great combo with vector deletions as you don't have to rewrite the data. Without vector deletions it could make deletes more expensive as the data would be spread and mixed across files.