Тёмный
No video :(

Advancing Spark - Getting hands-on with Delta Cloning 

Advancing Analytics
Подписаться 32 тыс.
Просмотров 2,8 тыс.
50% 1

Last week we looked at the announcements for Databricks Runtime 7.2 and got all excited about the notes for Delta Cloning - but we had some really good questions raised about exactly what happens under the hood. So this week join Simon as he takes a bit of a dive into DEEP and SHALLOW cloning with Delta on Databricks.
For more info on the Clone functionality and the other syntax available, take a look at the notes here: docs.databrick...
As always, for more tasty blogs, or info about our hands-on training courses, come visit us at: www.advancinga...

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 13   
@lucian1511
@lucian1511 Год назад
Very nice video! Keep up the good work! I am in the process of learning and your videos are excellent. I can only hope you will continue to upload new interesting stuff. Thank you!
@tanushreenagar3116
@tanushreenagar3116 10 месяцев назад
nice sir
@the.activist.nightingale
@the.activist.nightingale 4 года назад
Nice video -yet- again Simon! I really appreciate how you take the time to show all the manipulations and even the bugs ;) Seems like a cool feature but I'm wondering how it would fare if I am cloning a huge table 70-140M of rows? Maybe some stress-test would be needed on my side :) On the light side, please don't zoom on your face too often I get mesmerized by your eyes (are they blue-green) and I need to replay the parts multiple times :D HAHAHAHA#GirlProblems
@AdvancingAnalytics
@AdvancingAnalytics 4 года назад
Hey! Glad the videos are still useful! So the shallow clone for ~140m rows will be a couple of seconds, as it's just a bit of metadata. The deep clone will depend on your cluster, but that's not a huge amount of data for spark, you could easily have it cloned in between 5-20mins depending on the size of the cluster! Simon
@nikkaz5639
@nikkaz5639 Год назад
Hey Simon, thanks a lot for this video. A question: how would you then make live the clone version to become the original one? Thanks
@AdvancingAnalytics
@AdvancingAnalytics Год назад
Hrm, not sure that's possible - unless you update all files within the delta table, it will still be pointing to some files from the original! I'd say to treat clones like temporary entities, then re-do the operation if you want to make it permanent?
@prashanthxavierchinnappa9457
@prashanthxavierchinnappa9457 3 года назад
Hey Simon Thanks for a great video. Just the kind of channel I was looking for. A quick question I am wondering what is the best way to copy only certain partitions of a delta table and create a new delta table without having to copy all the contents. I assumed cloning would help somehow, but does not seem the case.
@AdvancingAnalytics
@AdvancingAnalytics 3 года назад
Afraid cloning doesn't support partition-scoping that I know of. You would likely need to write a quick dataframe that reads your source, filters to your desired partitions and writes to the new table - you wouldn't get table settings, transaction history etc copied across though! There are some workarounds with cloning, deleting partitions etc, but it'll be more work than just writing a quick dataframe!
@bhaveshpatelaus
@bhaveshpatelaus 4 года назад
Thanks Simon. I can see the use case of this in DR scenario where primary and secondary regions in ADLS or Blob is doing asynochrnous copy of data and thus make delta tables corrupted! Does DEEP CLONE happens with ACID guarantees. What if you are CLONING big tables and there is an interrpution to the cloning operation. Does it land incomplete data?
@sid0000009
@sid0000009 4 года назад
Shallow Clone : What happens to the cloned table if we update on the original table. As we understand the initial pointer of the cloned table is towards the original table data. Thanks
@AdvancingAnalytics
@AdvancingAnalytics 4 года назад
So the original table will see the new files as "replaced" in the trans log. The cloned table will point at the old files and work as expected. The only problem will come if you run a Vacuum on the original table after updating, then the shallow clone will no longer function. So not great for long-term, but fantastic for short term testing/experimentation! Simon
@sid0000009
@sid0000009 4 года назад
@@AdvancingAnalytics ur a genius!
@nishu2u85
@nishu2u85 2 года назад
Thanks much for clarifying :)
Далее
ЛИЗА - СПАСАТЕЛЬ😍😍😍
00:25
Просмотров 2,3 млн
Ingesting data into Lakehouse with COPY INTO
23:26
Просмотров 2,3 тыс.
Advancing AI - Ep. 1 Intro to GraphRAG
11:32
Просмотров 1,6 тыс.
Delta Table  - Clone
19:30
Просмотров 736
How This New Battery is Changing the Game
12:07
Просмотров 177 тыс.
Shallow and Deep Clone in Delta Table using Databricks
16:24
GraphQL vs REST: Which is Better for APIs?
7:31
Просмотров 195 тыс.
ЛИЗА - СПАСАТЕЛЬ😍😍😍
00:25
Просмотров 2,3 млн