Star schema or single table in Power BI

Подписаться 106 тыс.

Просмотров 37 тыс.

50% 1

Is it better to build a model as a regular star schema or as a single table in Power BI? The answer is clear: a star schema is always the best choice. This video analyzes the performance differences between the two approaches. Complete article: sql.bi/704257?aff=yt
How to learn DAX: www.sqlbi.com/guides/dax/?aff=yt
The definitive guide to DAX: www.sqlbi.com/books/the-defin...

Наука

Опубликовано:

12 апр 2021

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 72

@leonardbekker1524 3 года назад

I've just started with my journey to learning PowerBI. A senior colleague at work said we must always use the FAT model, but this video now explains why we are having performance issues with some of our models. I took some time and reworked a problematic model into using a star schema. It's made a world of difference, massive performance gains.

@PeterKontogeorgis 3 года назад

Great to see some concrete reasoning for star schema modelling aside from simplifying the model for ease of business consumption. I’ll definitely be bookmarking this for future reference. Another brilliant video. Thank you.

@sunilgidwani6321 Год назад

It's wonderful to encounter specific justifications for star schema modeling beyond its simplicity for better business understanding.

@MrMalorian 3 года назад

Thank you. I keep hearing it's better but it's nice to see the hard proof.

@Milhouse77BS 3 года назад

Superb video. The xmSQL reviews gave me ideas of what to look for in slow queries.

@pbihari0214 3 года назад

thank you, good comparison, we will always use the start schema.

@richardcroft3676 3 года назад

That is a fantastic explanation. Thanks

@jurgenfabender9719 3 года назад

Thank you Alberto - very impressive!

@chriskeo392 7 месяцев назад

Excellent video exactly what I needed

@raviv5109 3 года назад

Excellent! Thank you for sharing!

@muhammadkhurram7617 3 года назад

Awesome video to explain real-life scenarios where what model type can be a better choice...thanks for sharing! By the way, which microphone are you wearing? It looks super cool on you!

@drod96 3 года назад

Thank you for this video!

@adityalakkad499 2 года назад

Awesome video, lot's of my concerns got answered, and now i have another question, what about we use a columnar dataset.

@MrAstonmartin78 2 месяца назад

Great video as always. Thx

@abel9 4 месяца назад

Thank you for your video. In direct query mode (ex : connected to Bigquery), having a flat model is way better than a star schema. The relationships generate complex queries in the database, and take time to be performed. A flat model can be optimized in the database side by creating partition and clusters on the flat table to avoid the full scan each time you filter your data in power bi. Direct Query mode is not recommended due to its performance issue, but to ensure data confidentiality and to be able to query large volume of data, we can't use import mode...

@techproductowner 3 года назад

It is not just like that People -say both of you as God fathers of DAX ; this is more than true ;You both guys have given genuine service to the industry ; Most of the stuff out there are free including tools and guide which you could have charged heavily to corporates (considering te fact that there is without any doubt that there are no other reliable content for DAX ) Anyone with the deep desire can excel in DAX by following your books Definitive guide to dax and your videos(even you tube) wih practice. I am shocked to see books like DAX cookbook and few other which have copy pasted your contents without even chaning the real example and slightly changing the text - They should have written both of your name in big bold letters in Dedication page . but they didn't mention your name anywhere. You both truly deserve deep respect from me and everyone from the power bi user group. As on today I am more than sure that dere is no material available to learn dax the proper way either by first reading your book definitive guide to dax and practicing hard or your videos ; Google or any other youtube video won't help . . those tits bits of information won't help to learn DAX and anyone with any level of IQ will surely get confused if you don't learn dax in sequential way with theory and respective theory practice first. Dax is functional language but at the same time I say it as "IMPLICIT functional language " ; there is no front option else I would have make the front size of word IMPLICIT to 1000px You may argue that for everything there is an implicit concept, but here is the thing - in all other the degree of implicit may be less ( i don't know everything on earth ) but for DAX it's a very big bold IMPLICIT . Unless you don't control your nerves and be patient going step by step; you can't learn DAX Thank you once again from the deep bottom of my heart to both of you .

@SQLBI 3 года назад

Thanks!!!

@georgekhoury6389 3 года назад

great comparison

@dataarq945 3 года назад

Thanks Alberto.

@gusti3k 3 года назад

I should have had read the discription... Spoiler alert, star schema is still better. Crazy example but still a valid point, good job :)

@user-wx3dn3il2f 3 года назад

It's amazing!!!

@Sefr-gravity 3 года назад

Epic! Thanks

@alvarorodriguezlasso 3 года назад

The table at the end of the video is the true conclusion ... Thank you ... Regards from Cali-Colombia

@csgexec 3 года назад

Good information. It would be good to see the difference in performance when projecting column values from the dimensions in the same table visual (e.g. first name, last name, age, milestone dates, address, store, region, etc.) vs. projecting the values from the same table, as would be the case in a real world Drillthrough scenario.

@hichamkabel 3 года назад

thank you a lot for this great explanation. I have a question which import mode is the most useful in those cases (direct or import)?

@SQLBI 3 года назад

Import is always faster at query time. DirectQuery is useful when you have no time to refresh imported data, but you pay a price with slower queries.

@kamalverma7904 3 года назад

Hi Alberto, as always a great video with detailed explanation. Had a quick question. Could you please help me understand the difference (if any). CalculateTable(product,product[brand]="contoso") vs filter(all(product), product[brand]="contoso" ) Look forward to hearing from you

@SQLBI 3 года назад

FILTER is an iterator, CALCULATETABLE manipulates the filter context, the latter is faster. However, the code you wrote is different in results.

@chaitanyamatta3454 3 года назад

Thank you, Alberto. Extremely informative. I have a scenario where the customers asked for dynamic axis and legend in charts. They wanted to see the break down of measures by switching between different dimensions through a slicer. In my case, fat model was easy to build and I couldn't think of a way to get it done using Star schema. Any comments/ thoughts?

@SQLBI 3 года назад

Why is it different from a reporting point of view?

@Gustavofuentesm 3 года назад

Great!!!

@ChPetru 2 года назад

Conclusion: 35:02

@RajeshPhanindra 3 года назад

Great comparison, but if the DW is stored in Azure Synapse Analytics, where joins cannot be done so easily (given the tables are distributed), I see the performance is better with the fat table

@SQLBI 3 года назад

The video is about imported data (VertiPaq). DirectQuery requires different optimizations, depending on the data source.

@RajeshPhanindra 3 года назад

@@SQLBI Thanks. I must have missed that.

@jaimeildefonsosegurapena7328 3 года назад

Alberto, How about star vs snowflake for header detail or for multilevel categorization of products?

@SQLBI 3 года назад

Star schema is better. Header-detail could be very bad for the performance because of the cardinality of the relationship involved. This free video course introduces the topic better: www.sqlbi.com/p/introduction-to-data-modeling-for-power-bi-video-course/

@diegomex85 3 года назад

Is there any difference in performance using calendar in DAX or M? Ty Guys =)

@SQLBI 3 года назад

No differences in performance.

@mytwospence 3 года назад

Great video! I've been preaching the Kimball star schema for years!

@MDevion 3 года назад

Much better argument, dimensions can be sliced through several fact tables. Attributes on fact tables cant. Only use attributes on fact tables if the cardinality is the same as the fact tables. Fat tables uses a ridiculous amount of memory and you will pay for it when you have Azure Analysis Services. Also you will pay the price on the DWH side if you are in the cloud.(Eventually also onpremise but you wont feel it immediately) Query times should only be considered if query times are too long for business uses.

@SQLBI 3 года назад

All true! But we heard the "performance" topic too much and we wanted some numbers...

@Bobo-wl6bs 8 месяцев назад

I have a question. If I have 5 Excel tables that I want to import into Power BI. Should I normalize the tables in Power Query?. Someone told me I should but surely that would be inefficient because normalizing a table in Power Query would involve duplicating tables and this would increase the size of the data model. I'd kindly like to ask you about your opinion on this. How would you approach this?. Thanks in advance.

@leemeiwah Год назад

Hi Alberto, I hope u can enlighten me on this. Regarding the "Values of a Column", I thought that the compression in Vertipaq will compress the column in the fat model to be similar to the dimension table in the slim model, ie contains only a few values. Does it mean that we need to do a star schema and not to depend on the Vertipaq for compression? Thanks! 🙂

@SQLBI Год назад

The compression is there, but the optimal compression is a balance between normalization and denormalization. The star schema is usually the best tradeoff.

@leemeiwah Год назад

@@SQLBI Thank u! 😊

@danielquinton8959 3 года назад

Would the Star schema have been quicker if inner joins rather than left outer joins were used ?

@SQLBI 3 года назад

In general, yes. In this specific case, probably not much (the SQL Server database was already optimized). However, in every case the difference between DirectQuery and Import mode is orders of magnitude. If you use the INNER JOIN you have to make assumption about data quality which are not always possible.

@RetaxChennai 3 года назад

Is it better to build the star schema in sql or in power query?

@SQLBI 3 года назад

The earlier, the better. If you can in SQL, do that in SQL. If you can only in Power Query, then use that.

@tomsneed9548 2 года назад

if i create multiple queries from one large MSSQL table, and some queries emulate fact tables and others emulate dim tables, is that creating performance problems???

@SQLBI 2 года назад

It depends on the performance on MSSQL when you process the model. At query time there are no issue in an import model.

@Bharath_PBI 3 года назад

Yes, Star schema is ⭐. Curious to know, How this will fit for the video "creating slicer that filters multiple columns" by SQL BI ..?

@SQLBI 3 года назад

We don't understand the question...

@Bharath_PBI 3 года назад

@@SQLBI I was referring to this video "Creating a slicer that filters multiple columns in Power BI" ("ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-UvvS376vYSw.html"). Here some degree of denormalization is applied..

@SQLBI 3 года назад

Sorry, we still don't understand the question!

@vittoriogiatti 3 года назад

I think the download link is missing inside the article

@SQLBI 3 года назад

In this case we cannot share the data used in the demo (the volume would also make it hard to download).

@shoaibahmedkhan9676 Год назад

what's your view in case of data comprising only 10 million rows? I have a model with 3 million rows. When I did star schema and take all the columns/fields to export, PBI service runs out of memory faster. When I put all in one table, then the same table visual needs less memory and i can add more fields. Otherwise it seems joins are taking a good amount of memory.

@SQLBI Год назад

It seems very strange. Usually, it is the opposite. Did you use particular measures in your reports?

@shoaibahmedkhan9676 Год назад

@@SQLBI The only table left out of fact table was calendar. I just put calendar also in the fact table and the performance of table visual has improved drastically. Before I was able to take only 3 months of one country and with the 4th month in selection, PBI service visual runs out of memory. Now i can take two countries with 5 months, so 3 times data than earlier. So flat table performance has improved whenever I remove one star schema table and bring it into fact table. Model size only increased by 1 MB as per Dax Studio Metrics with calendar in fact table but Table Visual performance improved 3 times which is my requirement to give to users to export data. I think in Table Visual I am adding so many fields/columns, in star schema it has to perform multiple joins over many rows to generate table visual and join seems memory expensive. I am using in that one multi currency formula which I took from your website Amount USD = VAR AggregatedAmountInLCY = ADDCOLUMNS ( SUMMARIZE ( MH03_Reported_Avg_Rate, MH03_Reported_Avg_Rate[Country] ), "@AmountInLCY", [Amount LCY], "@Rate", CALCULATE ( SELECTEDVALUE ( MH03_Reported_Avg_Rate[LCY_to_USD] ) ) ) RETURN Sumx(AggregatedAmountInLCY,DIVIDE([@AmountInLCY],[@Rate],0)) and three formulas like this one Vol Pcs = CALCULATE( sum(D01_TPM_Promo_Data[Amount Local CCY]), D01_TPM_Promo_Data[KPI_Name]="Total Volume (Units)", //D01_TPM_Promo_Data[Promo_No] IN VALUES(D01_TPM_Promo_Data[Promo_No]), ALL(D01_TPM_Promo_Data[Investment_Method]), ALL(D01_TPM_Promo_Data[Fund_Type]), ALL(D01_TPM_Promo_Data[Tactic_ID]), ALL(D01_TPM_Promo_Data[IO_Number]), ALL(D01_TPM_Promo_Data[Promo_Mechanic]), ALL(D01_TPM_Promo_Data[Tactic_Comments]), ALL(D01_TPM_Promo_Data[On_Off_Invoice]), ALL(D01_TPM_Promo_Data[Condition_Codes]), ALL(D01_TPM_Promo_Data[RA_SD_Number]), ALL(D01_TPM_Promo_Data[Deal_Rate_UOM]), ALL(D01_TPM_Promo_Data[KPI_Type]), ALL(D01_PnL) ) and one this formula Deal Rate % = VAR _AggregatedAmountbyFund = FILTER(ADDCOLUMNS ( SUMMARIZE ( D01_TPM_Promo_Data, D01_TPM_Promo_Data[Fund_Type] ), "@AmountbyFund", -[Amount Multi CCY]), D01_TPM_Promo_Data[Fund_Type]BLANK()) RETURN Sumx(_AggregatedAmountbyFund,DIVIDE([@AmountbyFund],IF(D01_TPM_Promo_Data[Fund_Type]="CentralCopack",[Trade GSV],[Net Trade GSV]),

@SQLBI Год назад

It's possible you get better performance with the flat table because of autoexist, but it's quite strange to see that in the scenario you described. Two questions: 1) How many rows do you have in the Date table when you use the star schema? 2) At which level of granularity the calculation fails (day/month/year...)

@julsgranados6861 3 года назад

can you share the pbi example files please :)?

@SQLBI 3 года назад

You can find the link in the video description!

@LucaGanugi 3 года назад

Ciao Alberto, these numbers come from a really large fact table. In a simpler and smaller fact table, something like 100M rows, can we draw the very same conclusions? Great works as always!!!

@SQLBI 3 года назад

The differences are similar, but the absolute numbers are smaller and hard to compare. Moreover, a common misconception is that a single "fat" table is faster when there are many rows, which is not true (in Analysis Services and Power BI).

@nagendrakg5062 3 года назад

Star schema Always wins...

@juantray Год назад

Maybe I am being naive, but it seemed as if the fat model won 6 out of 10 times. And any difference was so marginal that it wouldn't be worth trying to convince skeptics to use a star schema. But I'm a noob. And I have a lot to learn about the nuances that matter.