Тёмный
No video :(

Implementing SCD Type 2 using Delta 

Knowledge Sharing
Подписаться 1,9 тыс.
Просмотров 20 тыс.
50% 1

This video shows how to implement SCD type 2 using Delta tables. This is similar to the method available in SQL.
if you missed introduction video of deltabricks, please see that first • Delta lake features an...

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 24   
@sravankumar1767
@sravankumar1767 2 года назад
Superb
@yousaf933
@yousaf933 3 года назад
Very Informative...
@KnowledgeSharingjkb
@KnowledgeSharingjkb 3 года назад
thank you Yousu ;)
@sreejithm5204
@sreejithm5204 3 года назад
Good video. what does the 'Using Delta' in table creation statement mean?
@KnowledgeSharingjkb
@KnowledgeSharingjkb 3 года назад
Delta is the format used here though the file format is parquet. If you go inside the delta folder you will see the log files. Log files are similar to sql transaction logs
@omkarshirsat1728
@omkarshirsat1728 2 года назад
can we have a some logic where we do not have to use this mergekey as extra column, as I also want schema evolution along with scd2, but using this approach with schema evolution I am also getting the mergekey column in the output, which I don't want, so is there any way?
@user-px1gi7gl6n
@user-px1gi7gl6n 5 месяцев назад
Hey, Try to run merge query again and again. It will insert the records into Dim Table.. Beacuse of joinkey considering as null always EmoloyeeId from target not matched with null and keep on inserting records
@chintusingh4439
@chintusingh4439 Год назад
Merge will not work if there are duplicates in the source. Is there any workaround for the same?
@KnowledgeSharingjkb
@KnowledgeSharingjkb Год назад
you are right Merge will not work if duplicate available.
@tud3571
@tud3571 Год назад
good ...but what about the constraints u didn't use any constraints
@KnowledgeSharingjkb
@KnowledgeSharingjkb Год назад
I believe you are thinking to add primary key and foreign key but unfortunately we dont have it here. only constraints support at this moment are and
@piotri3932
@piotri3932 3 года назад
Aren't you updating ALL historical versions of given customer instead of just the current one? Let's say you get new address of customer A od day 1 and the next day another update of the same customer. In the latter case you should be updating the end date of only the version from day 1.
@KnowledgeSharingjkb
@KnowledgeSharingjkb 3 года назад
That is correct. We need to update only the latest one. We can include the date filter in where condition to pick only the active record. Thanks for pointing this out
@olaoluwaabokede6654
@olaoluwaabokede6654 2 года назад
@@KnowledgeSharingjkb Thank for the video, as pointed out by Piotr, we will be updating even the historical data each time address line changes. Can you please share with me where I can include a date filter in the where condition to pick only the active record. Here is my code: %sql MERGE INTO myspace a USING ( SELECT fullPath AS mergeKey,* FROM allstars UNION ALL SELECT NULL as mergeKey, a.* FROM allstars a JOIN myspace b ON a.fullPath = b. fullPath AND b.modifiedTime != a.modifiedTime )b ON a.fullPath = b.mergeKey WHEN MATCHED AND b.modifiedTime != a.modifiedTime THEN UPDATE SET a.endDate = current_date()-1 WHEN NOT MATCHED THEN INSERT ( fullPath ,filePath ,fileName ,simpleFileType ,complexFileType ,fileFizeKB ,modifiedTime ,startDate ,endDate ) VALUES ( fullPath ,filePath ,fileName ,simpleFileType ,complexFileType ,fileFizeKB ,modifiedTime ,current_date() ,'9999-12-31' ) Thanks
@olaoluwaabokede6654
@olaoluwaabokede6654 2 года назад
Work around was to use UNION instead of UNION ALL (UNION keeps unique records, UNION ALL keeps all records including duplicates)
@user-td8vv9qh5m
@user-td8vv9qh5m 9 месяцев назад
if there is no change in source data and we try to run the merge code again as part of daily run then the mergeKey null records will be inserted again into target column as active and we will be ending with duplicates , how to solve it ?
@KnowledgeSharingjkb
@KnowledgeSharingjkb 6 месяцев назад
there should not be null values in the key columns. please handle nulls before the insertion
@kenpachi-zaraki33
@kenpachi-zaraki33 11 месяцев назад
can you please write scd type 2 code in generic way currently you have written it only for the one column please and thank you.
@KnowledgeSharingjkb
@KnowledgeSharingjkb 9 месяцев назад
yes, this is an example. please let me know your requirement in detail.
@rajsekhargada9212
@rajsekhargada9212 Год назад
what if othet column updated apart from address
@KnowledgeSharingjkb
@KnowledgeSharingjkb 9 месяцев назад
use the columns that you need to consider for scd type 2. this is just an example
@vinayakbiju932
@vinayakbiju932 Год назад
can you share the file you have uploaded here the csv file
@KnowledgeSharingjkb
@KnowledgeSharingjkb Год назад
databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2683250055805286/1609000298248664/3892374572023226/latest.html
Далее
Delta lake features and demo
28:54
Просмотров 1,6 тыс.
Advancing Spark - Delta Live Tables Merge!
26:08
Просмотров 19 тыс.
Using DMS and DLT for Change Data Capture
40:08
Просмотров 4,1 тыс.
optimization in spark
13:03
Просмотров 6 тыс.