Тёмный

To Surrogate Key or Not... 

Guy in a Cube
Подписаться 437 тыс.
Просмотров 15 тыс.
50% 1

We've talked about using a surrogate key in your data warehouse whether that's Azure Synapse Analytics or something else. Patrick looks at why you should consider this even if you aren't using a slowly changing dimension.
📢 Become a member: guyinacu.be/membership
*******************
Want to take your Power BI skills to the next level? We have training courses available to help you with your journey.
🎓 Guy in a Cube courses: guyinacu.be/courses
*******************
LET'S CONNECT!
*******************
-- / guyinacube
-- / awsaxton
-- / patrickdba
-- / guyinacube
-- / guyinacube
-- guyinacube.com
**Gear**
🛠 Check out my Tools page - guyinacube.com/tools/
#AzureSynapse #DataWarehouse #GuyInACube

Наука

Опубликовано:

 

2 янв 2023

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 27   
@eziola
@eziola Год назад
Incredible! Keep these SQL theory vids coming. This is high-level stuff. Thanks Patrick for continuing to be there for us enterprise level SQL folks. YOU DA MAN
@ChrisWagnerDatagod
@ChrisWagnerDatagod Год назад
Surrogate keys are THE most critical piece for good model management. Management of keys pushes you to ensure the proper Schema. Do it! Do it!
@jeffshieldsdev9026
@jeffshieldsdev9026 Год назад
Implementing surrogate keys (at least in the Power BI model) usually just adds management overhead and increases consumption and refresh time. If your source has unique primary keys, why not use them?
@ChrisWagnerDatagod
@ChrisWagnerDatagod Год назад
@@jeffshieldsdev9026 Surrogate keys should typically be added upstream in the datawarehouse or the data mart giving you a star schema to source your reports from. If you do NOT have a star schema you are sourcing from, then you should be using surrogate keys to manage dims and facts to prepare for when your source changes, or you add sources to your model.
@eta2001carl
@eta2001carl Год назад
@@ChrisWagnerDatagod how about you need to allow to user to add like a budget fact table to the model (Copositive model) then they will need to know the SK value of an Item, if the Item Dimension has NK then they don't, just add it and it works.
@ChrisWagnerDatagod
@ChrisWagnerDatagod Год назад
@@eta2001carl You still want the SK for the core model and then use the NK to the DIm that will have the NK to form the composite model. Do not join the NK to the Fact table.
@pdhar1384
@pdhar1384 2 месяца назад
short and sweet. Thank you 🙂
@AmritaOSullivan
@AmritaOSullivan Год назад
Brilliant explanation!!!! Thank you
@jeffshieldsdev9026
@jeffshieldsdev9026 Год назад
On a related note, I've seen users replace string-based natural keys with integer-based surrogate keys in their Power BI data models, assuming the smaller integer data type would be more performant. However, most of the time, the VertiPaq engine will hash these integers, making them strings anyways, so all the user is really doing is adding complexity to their transformations and compute to their refreshes.
@mcnater
@mcnater Год назад
Do we know this is always the case? Or how are you seeing this in your work? Curious as we tend to make the strings integers but I've seen this noted elsewhere and always wondered if it's true.
@jeffshieldsdev9026
@jeffshieldsdev9026 Год назад
@@mcnater I don't think this is always the case--but VertiPaq is going to choose HASH or VALUE encoding by its own evaluation. You can try applying encoding hints using Tabular Editor, but it didn't work for me when I tried. I'm able to see the encoding type by connecting to the dataset from DAX Studio and clicking View Metrics on the Advanced Tab.
@mcnater
@mcnater Год назад
@@jeffshieldsdev9026 got it thanks
@JohnBrown-hx8bq
@JohnBrown-hx8bq Год назад
Important to understand when to use them, and also when not to. Sometimes its very easy to overcomplicate your models with an insistence on 'incremental' SKs, even when they are not needed or even costly (e.g. in columnar datasets). I prefer the Super Natural Key / Durable Key model, or even better, having natural dimensional fields actually in the facts for the columnar data compression to work with.
@clairerovic
@clairerovic Год назад
Love this, thanks 👍
@benshaw2644
@benshaw2644 Месяц назад
Amazing video x
@pauloneufneufneuf
@pauloneufneufneuf Год назад
Nice simple explanation of when & why to use surrogate keys. Particularly the why - too often surrogate keys are presented as being good design always, even when they aren't needed and the natural key will work fine. Side note: One habit that doesn't seem to go away, but has no value vs using the natural DATE is creating a YYYYMMDD integer surrogate key for a date (aka Calendar) table. Bad: The INT has more bytes than DATE - it will cause more IO. So it will be slower, even if only marginally. Bad: It has no integrity enforcement as a valid date. Bad: It is awkward to manipulate as a date - e.g. using date functions. Dates as 4 byte INT has not had value since the introduction of SQL Server 2008, with the 3 byte DATE type.
@baklava2tummy
@baklava2tummy 8 месяцев назад
Thanks, this was interesting to read. We hold all our dates as integers and it’s a pain when trying to manipulate back to date. I didn’t think about the IO being impacted by it.
@umeshchanchlani7373
@umeshchanchlani7373 Год назад
Nice video. But i have question. What if customerId was a varchar do we still need to have surrogate key?can you please clarify?
@DanielRodriguesPT
@DanielRodriguesPT Год назад
Good vídeo, I'm going to share this with my team! Also the new source system could have overlapping IDs (even thou it is varchar it can contain ints)
@eta2001carl
@eta2001carl Год назад
How about you need to allow to user to add like a budget fact table to the model (Copositive model) then they will need to know the SK value of an Item, if the Item Dimension has NK then they don't, just add it and it works.
@Milhouse77BS
@Milhouse77BS Год назад
Not my first priority. But you hit on the best reason to use it, if your IDs like CustomerID may change or be augmented in future.
@ChrisWagnerDatagod
@ChrisWagnerDatagod Год назад
It should be. ;)
@Vikasptl07
@Vikasptl07 Год назад
During regular load then dimensions needs to be loaded first then facts by look-up on dimensions for keys
@optimalchoice270
@optimalchoice270 Год назад
If the data you are processing has or could have more than one source, and you want this representation, the information you are providing, to be used by any other process, then an independent (surrogate) key is needed to represent the composite information. If you were to represent a multi-source dataset with the key from one of its sources, you would be changing the meaning of the source keyfield and adding a processing dependency. I would argue that every dataset should have an independent keyset because change is inevitable in real-world processes. Certainly, for repositories that are intended to be long-lived data warehouses, independent (surrogate) keys should be the default for the reasons illustrated here, among others.
@maherkebaier5954
@maherkebaier5954 Год назад
Maybe i'm missing some thing, once there is a new line in the customer dimension table how can the fact table get the new CustomerSK?
@wilsonman8661
@wilsonman8661 11 месяцев назад
@@federicozambelli9570 Frankly, I'm on a mission to figure this out myself at the moment so take this with a grain of salt: One solution I've found is to join the fact table back to the dimension table in your data warehouse and bring in only the surrogate key from the dimension table. Seems clunky to me but also I'm definitely still figuring things out! :D
@fb-gu2er
@fb-gu2er 5 месяцев назад
I don’t think customer id is a natural key. Natural keys are derived from the columns. If the id is chosen arbitrarily, then it’s synthetic
Далее
100❤️
00:19
Просмотров 4,6 млн
ШОКОЛАДКА МИСТЕРА БИСТА
00:44
Просмотров 398 тыс.
Working with Slowly Changing Dimensions in Power BI
13:31
Twitter Wars! Natural vs Surrogate Keys
6:30
Просмотров 1,5 тыс.
Primary & Foreign Keys
8:25
Просмотров 471 тыс.
Why Power BI loves a Star Schema
8:10
Просмотров 122 тыс.
Why Surrogate Keys are used in Data Warehouse
7:03
Просмотров 138 тыс.
Handling MULTIPLE fact tables in Power BI
9:02
Просмотров 298 тыс.
Here's a BANANAS report design idea!
7:42
Просмотров 41 тыс.
Сравнили apple и xiaomi!
0:21
Просмотров 39 тыс.
ИГРОВОВЫЙ НОУТ ASUS ЗА 57 тысяч
25:33