Тёмный

Advancing Spark - Databricks Runtime 7 2 & Delta Cloning 

Advancing Analytics
Подписаться 32 тыс.
Просмотров 1,5 тыс.
50% 1

What? It somehow passed us by that not one but TWO new Databricks Runtimes have been released since the Spark & AI Summit. With Databricks Runtime 7.1 now live, and Runtime 7.2 already available in Beta, Simon takes a look through the new features, including the vast potential of the new Delta Clone functionality!
If you're hearing "table clone" and thinking "well that's not very exciting", you might want to watch and see just how useful it can be! Don't forget to Like & Subscribe while you're there!
UPDATES - Since recording we've validated the following:
1 - CLONE is now available as SQL, python, scala and java!
2 - DEEP CLONE takes a copy of the active/current files, not the full history!
The Databricks runtime docs can be found here:
docs.databricks.com/release-n...
docs.databricks.com/release-n...
And don't forget to check out our Databricks Training, over at:
www.advancinganalytics.co.uk/...

Опубликовано:

 

14 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 6   
@etech-ej2yj
@etech-ej2yj Год назад
shallow clone seems quite useful actually, perhaps there's a way to configure the clone to disallow any vacuums 🤔 thanks for the video!
@surbhisingi5214
@surbhisingi5214 3 года назад
Hi Simon - I really enjoy watching your videos on Databricks + Spark and have learnt a lot about features available within Databricks. Currently I am exploring Docker in Databricks. Could you please create a video how Docker can be used within Databricks and all the features which are available? Thanks in advance!
@ynwtint
@ynwtint 3 года назад
What essentially the difference will be between Delta Clone and normal CREATE AS new table?
@AdvancingAnalytics
@AdvancingAnalytics 3 года назад
Hey both - there's a bit of clarity in the clone syntax blog: "Cloning a table is not the same as Create Table As Select or CTAS. A clone copies the metadata of the source table in addition to the data. Cloning also has simpler syntax: you don’t need to specify partitioning, format, invariants, nullability and so on as they are taken from the source table." So - it's different to CTAS in that it copies over the trans log details and a load of the metadata in there, and it shortcuts having to provide a load of settings for the new table. That said, time travel does not work on the new table - it looks to copy over only the new files, and your trans log will have a single "CLONE" activity, rather than the actual history. That's when we're talking about DEEP CLONE that is, SHALLOW will have other implications! Simon
@sid0000009
@sid0000009 3 года назад
Shallow cloning - If we can access a Delta table using versioning, not sure what is additional advantage we get access the same with the only addition of a new cloned table. Deep Cloning - It doesn't copy the entire data history of a delta table..which seem bit dis-appointing...( copied data as per the version specified or the latest delta data snapshot )
@AdvancingAnalytics
@AdvancingAnalytics 3 года назад
So Shallow Cloning - it's all about changing the data at the same time. So we can make two entirely different changes to the data and compare, without having to copy the whole thing. If we tried this with versioning, we'd have to make the first change, undo that change then make the second. Gives a whole safety net about trying new things in a stable environment. Might be a bit niche, sure, but certainly solves a few problems I've seen. And yeah, Deep cloning not taking the history, I guess it's one way or the other. If it took the full history I'd complain that it takes redundant files! Would be good to have a history/no history option with Deep to give the flexibility. Either way, definitely has a couple of uses. Simon