Thanks for that Patrick. I have been using star schema and some good data modeling practices and I have to say: "This is the way". Not only performance is better: design, end user understanding maintenance is much much easier as well. The good practices with Power BI make us Super Power Workers! Thanks again!
I have recently started expanding my knowledge with PBI and your channel has amazing information, examples and tips. I appreciate your work very much! Thank you for your efforts!
I guess it is that: Where to create your columns in Power BI | Data Modeling Best Practices ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ZSUCmi6h5SY.html
There is another drawback to wide tables I think you could have mentioned. When a context transition occurs, the entire table being iterated is brought into memory, uncompressed. For wide tables with lots of rows, this will ravage memory even more.
Thanks for your videos. I always enjoy them. I understand star schemas. Been using them for decades. What causes my hangups are filtering directions in the model. Hopefully you have a video discussing that topic.
Thanks for the great tips. Currently, I have the following requirement for dashboard to show monthly comparison for the whole year, as well as, year to year comparison, with ability to drill down to transactions. 6 excel source reports each month, from different sources. These reports need to be consolidated after complex transformation and additional classifications from user mappings. My initial thought was to make it into one giant dataset for user convenience. After watching your video, it seems I need to redo from scratch and build the star schema, with additional mappings. Reason is there is no standard key fields from the different sources. Imagine you have 3 sales reports but for the same products, each report uses a different product code. Would this be the right approach?
Awesome!! I love watching your vids instead of going through formal Power BI training because I understand SQL and data analytics very well and want to jump right into using this tool which is new to me. Power BI is incredibly powerful and it's actually easy enough to get started in, but as things get complex it's not always intuitive. So thank you for all the tips, info and best practices! :) I do have a question though, and maybe you have already made a video on this but I'm new to the channel so here I go! Can you talk about the difference between star schemas, Merge with a JOIN, and simply setting up a relationship between 2 tables via the Manage Relationships button (which is maybe the same thing as star schema but I didn't realize that?) ? And when it is best to use which? I find myself wanting to use Merge b/c it's the most like writing in SQL, but I want to use more tools within Power BI, and let it do the work for me!
Hello Patrick - Thanks for the videos. Learnt a lot through your videos and your unique teaching style :) I have a situation and I would like your inputs on this. I have a Fact Table where I have 3 different dates like Order Date, Shipped Date and Delivered Date. I have a custom Date Dimension Table that I created in my Data Model using the CALENDARAUTO() Dax Function. Now I need to use that Date Dimension Table as Role Playing Dimension because for some queries, I need to leverage the join between Order Date Date, for some Shipped Date Date and for some Delivered Date Date. However, while building the relationship I can only have one Active relationship between the Fact Table and the Date Dimension Table. I could think of 2 different possibilities based on whatever knowledge I have gained. Number 1: I know that I have an option to create 3 versions of each measure using the USERELATIONSHIP Dax Function to force which relationship to be used while calculating the measure. But as I mentioned that it would mean that I need to create 3 version of each measure e.g. Sales by Order Date, Sales by Shipped Date, Sales by Delivered Date etc. Number 2: Otherwise I think I have to have 3 separate Date Dimension Tables like Order Date Dimension, Shipped Date Dimension and Delivered Date Dimension. Can you please suggest what should be the right and appropriate way to solve this model design situation? Looking forward to your help.
So basically, normalise your data if you have redundant data in tables to avoid duplication and the hideous many-to-many relationships and pivot wide tables to long format where that makes sense (normalisation will help with the width as well)?
Hello, I would say that the schema you are showing at 8:30 is not a start schema but a snowflake schema. Normally in a star schema you would only only have fact tables "in the middle" and then dim tables connected to the fact table. But no dim table should then have other dim tables connected to them. If you have dim tables connected to other dim tables then this is called Snowflake schema and this structure will reduce the performance as not only one filter is used but several filter must be used to get the result. To get an ideal star schema it is then necessary to have redundant data in the dim tables. and that is the biggest difference to a database schema. In a database schema you will try to avoid redundant data by normalising the database tables. But this normalisation is not usable for creating Power BI reports with optimal performance.
That was a quick and effective video on Data Modelling. I need help to model the data for one of my client. May be this will be of some help. Can I connect with you directly some how if I need some help on the data
This is 100% basics right here. Its all review for me but Im watching to support this great channel and to simply keep this fresh in my brain. STAR schema is a must in most cases.
@@GuyInACube I'm just now getting into PBI after years in SQL. So designing the back end is my specialty. But the same rules apply in a lot of situations because the concepts are the same. Many times I have to remind someone "do we need the information in two places or can we table these out and relate them?"
Nice video. The demonstration of the data model seems to me like its Snow Flake. I know Snow flake is a variant of the star schema but varies at one critical point that Star Schema is allowed to be denormalized (which is opposite to what you are recommending in terms of narrowing the table) as it has to contain the attributes of the dimension whereas Snowflake is strictly normalized. Could you please clarify how your example defies the variation and conforms to Star schema?
Hello, I'm new to Power BI. I'm used to using Power Pivot in Excel and joining tables through SQLServer to create queries (sometimes many many tables). Occasionally creating relationships to other queries within the back end. I was just curious if you guys had any recommendations on when to join tables through relationships or when to merge tables in Query Editor? Coming from Excel, I thought the best practices were to merge tables with joins, but soon found out that wasn't optimal in many cases. So any tips? Thank you!
I had the same problem last week. Having two fact tables were i couldn’t use a slicer for a field in both tables correctly. The slicer would filter one table but not the other.
Hey Patrick, I'm not seeing the links to the STAR schema you referred to and I would also be interested to watch the video you did with your daughter on the topic.
Break out everyhting into dimensions and facts and you will avoid ultra wide tables and allow you to re-use a lot of these dimensions also. Sometimes doing this seems to suck because you have to do a lot of joins on data which feels like it belongs in the same table, but in general you save pain later on doing this categorization at an early stage. The most important thing is to break out the "conformed dimensions" because these can be re-used again and again and again in your fact tables.
Patrick: How about to get the links for what you talk about here: 1) Star Schema Overview (you talk about in 6:39) 2) "Flat tables and create data model" (you talk about in 6:48) This will be great to have it! I will be waiting for your answer. Best regards!
Great vid question - How do you deal with data that contains different types of markers please i.e. ".." for missing data, "..." for suppressed data, "." for data from a different time period?
Do relationships on numeric columns work faster than those on text columns? Hi Adam and Patrick, I have a very troubling question regarding the best way to design a data model I can't seem to find a solution online. It's about the relationships between tables in a Power BI data model. I have always been doing it on numeric columns (Keys), but a stakeholder is challenging that even if it is on long text columns, it's the same in terms of performance. Can you help me here by sharing your thoughts on below? - Ensuring the Primary and Foreign Keys between the DIMs and FACTs in the Power BI relationships are on numeric columns rather than text columns (This would also reduce the volume of data in the Model in terms of MegaBytes, so that also helps Front-end performance) o Example, DIM_ProductCategory and the FACT tables are on the concatenation of Level 1, Level 2…Level 6, that is a not as good as numeric columns o The FACT table has over 16 million rows which are just repetitions of 187 distinct values of concatenation of Level 1, Level 2…Level 6. This would consume way less space if these were just 1 to 187 number values. This is the same case in 3 such FACT tables (different levels) relationship with the Dimension table of ProductCategory
Imagine there is no dw but numerous summarised tables built with numerous dimension tables in rdm would the best way to tackle this be building denormalised table structure into a star schema model ? Our company has numerous models with different levels of summarisation - aggregation in the same model would you create lowest level of data and summarise using measures to aggregate to different levels using dimension relationships ?
Thanks Patrick! Awesome advice.. I am great a fan of both of you guys. Just a question that, Is it a good practice to use SQL DB Views instead of DB Tables?
Hi Patrick, Need to understand how you are not getting an inactive relationship when you are dealing with multiple fact tables and multiple conformed tables? Is there any setting that you are making in power query editor etc?
Hello Guy in a Cube, can you recommend a reputable Power BI expect? I need support integrating Power BI with forms and unfortunately the videos are a bit fast
great channel! and thank you for what you do! i have a question. some time when i put some value in my table it s shaow me just the first value from my database and i can't choose any form of calculation (sum, average and co...) how can i do to change this? thank you agan
How can i narrower my table which have a column with comma separated string values for almost like 20?? And i really have to work with those values.. Really need the help. TIA
Great video Patrick. So, I do have a question though: is creating conditional columns in the query editor better than creating custom columns in the data model? Does that make real difference in terms of performance? I do have a few reports in which I used to create custom columns, but I am seriously thinking in replacing them directly in the query editor. Thanks for all you do!
The issue I would like solved, and you mentioned, is the memory usage. PBI works great with Excel obviously because Excel dataset size is limited. Hook it up to a database with hundreds of millions of rows regardless of how well modeled, partitioned, indexed and it just goes to sleep as in task manager, end process, sleep. I've cancelled the table analysis, joined the tables, published and the services can't handle it either. I can't decrease the data size any more than I already have. We get millions of transaction every day. Fact table has 10 columns. 2 dates, 4 numbers and four joinable dimension keys. The dimensions are created from their own tables. Everything that can be turned off in options is. Any thoughts?
Working with large data can be problematic. That's where features such as incremental refresh and Aggregations really become powerful and I've seen data models with billions of rows work using those items. The struggle is real though.
Great video. Smart. Great pace. You very quickly setup the various problem statements, explained the options, and demo'd. You're a great instructor. Thank you!
@Patrick I have a scenario where I got 3 Million rows distinct Products linked to a Sales Table. However in the Sales only about 1 Million Products have been sold so far. I really dont need the unsold 1 Million products to be loaded into my model. Is there a way Power BI can delete unused records in a relationship? I could use a Sub Query on my Product to check if it exists in Sales, but I feel it will be expensive as my Sales data increases. Any suggestions?
Dang...where's the Power Bi 101 video! I just finished a Power BI course and it was great theory and cool to see all the tool can do, but I'm lost at where to begin! I need some basic report building exercises to show me step by step what to do. Any such resources?
Hi Patrick, Do you know if there's a way to export the data model to Visio Pro ? I work a lot with the modeling tool but I find that's very limited. I'd like to customize the background color, assign different colors to different objects, etc. Thanks !!
Hi Patrick, my RMS creates a 24-digit primary key. Power BI displays these in scientific notation, which, of course, creates a ton of duplicates. We have tried to change the data type without success. We are forced to make the data type Text, which of course is a pig when it comes to compression. Is there a way we can have Power BI display the full 24-digit primary key?
Nice video Patrick! What about wide fact table? I have a wide consolidated fact table (60 Measures that are derived from 5 detail fact tables each covering 5 business processes - Sales, Marketing, etc..) that is rolled up w.r.t the conformed dimensions from the detail fact tables. this wide consolidated fact table is a physical table (with 60 Measures). Is this better than creating the 60 measures in DAX on top of detail fact tables?
There are a lot of factors that go into that. Assuming these are integer columns and not strings as you referred to them as measures. Depends how many rows in the table, etc. Best bet is to determine what your performance baseline is and then if you can optimize that at all. If the baseline is pretty fast, you may not need to do anything. Also, if you are using all the columns then you need them. If you aren't using any of them, then can we get rid of them?
Love it! I’ve leveraged this process for all sorts of reports, everything from property and date tables to organize my various sources of data. Thanks for the deeper dive .
Hi, i am working in corporate company. We are using powerbi on premise version. I want to build a data model to use in my dashboards. Which tools do I need to build datamodel in on premise version? Thanks
My experience with the star Schema is mixed especially with independent product tables where it limits your ability analyse to a certain extent. Cross tables was one of them
Good day sir. I would like to ask on how to include a dimension on a hierarchical design (recursion) to a fact? Kindly set an example and provide explanation if possible. Thank you.
I have a question not related to the content in this tutorial. The "fill map" visuals available in the PowerBI market place do not have a "conditional format" feature that can allow users to define "data colors". This function was previously available in one of the earlier versions of PowerBi but when i go to "Data color" options within the fill map visual there is no longer provision for "conditional formatting". The choropleths were also able to automatically adjust the color schemes using filters in that version of PowerBI. Is it possible for you to demonstrate how one could utilize the currently available choropleths to achieve the same(Conditional formatting of Data Color for fill map visual, which can respond to filters).It would also be wonderful to see how to upload and use shape files(.shp)
I have a question not related to the content in this tutorial. The "fill map" visuals available in the PowerBI market place do not have a "conditional format" feature that can allow users to define "data colors". This function was previously available in one of the earlier versions of PowerBi but when i go to "Data color" options within the fill map visual there is no longer provision for "conditional formatting". The choropleths were also able to automatically adjust the color schemes using filters in that version of PowerBI. Is it possible for you to demonstrate how one could utilize the currently available choropleths to achieve the same(Conditional formatting of Data Color for fill map visual, which can respond to filters).It would also be wonderful to see how to upload and use shape files(.shp)
I have an important question! What is the difference between data modeling (like star schema) and database design/ database schema? How do we design a database so that it flows nicely to a data model? Can’t the database schema already just be star schema so that they are one ans the same??? But how can we expect users to update data in star schema form when everything is so fragmented and it uses integer IDs everywhere? Thanks!!!!
Hey, I need some help.. Is there a way we can 1. Create Calculated columns based on 2 different tables. 2 Create a data preview from the model I create in Visual Studio. I would eventually use LIVE connection due to data volume size in Power BI so the modelling part needs to be finalized in Analysis services only as we hit limitations in Power BI. Thank you!
Any tips on books or other resources for data modeling? I commonly find times where the data is set up in a way where creating summary tables using Power query would be necessary, or cases where I need to look for instances where a date in one table is less than or greater than a date in another table.
Yooooo! wassup! If you have SSAS capabilities, one should use this option for creating the model and then connecting to that source from Power BI. I find this to be a more efficient way to work with data coming from disparate sources. When things get too complex, it is always good to return to the basics. Good simple video with powerful information.
Hi everyone! Maybe somebody here could help me with this issue: I have a FactTable with CustomerID and ProductID as ForeignKeys. The CustomerID has also some columns with information about territory such as (City, Region, Country). I tried to combine those columns with the Fact Table, create the new Territory Table related to this columns and then delete the columns from the fact table and the Products Table with Power Query, butn if I delete any of this columns, all their fields become null. As the columns should not repeat in a Star Schema, is there any way that I can separate those columns into a different dimension table and add this new Territory Dimension with his specific Territory Key inserted into the original Fact Table? Thank you very much!
Hey, Patrick! First of all congrats for your tips. So... could you put here the link where you talk about “Avoid many-to-many relationship”? I didn’t find it. Thanks, bro
Nice videos. Question that is driving me nuts: When I unselect all options in a filter in Power BI, the visualization shows as if all options were selected. It should logically show nothing. Thoughts?
Thanks, Patrick! I love and follow in practice the theory of star schemas. However, 15 minutes into the real world you find that your stars are related and there is a dearth, if any, of examples on how to work such situations. The classic example: you build star schemas for orders and shipments and then realize that you need the shipments related to each order. Then what? One is not supposed to have relationships between fact tables! And this gets more complicated as you introduce more star schemas into your overall model.
Fact-to-Fact. What a challenge. Typically you could integrate (join) two facts based on common dimensions. Using this method you can then analyze values between the two facts based on those dimensions. You could also look at degenerate dimensions, which is an advanced topic, but also a common practice.
@@GuyInACube Any chance of a video on this? :) Is it related to role-playing dimensions (you have a video on this: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-2BxaUXlx3K4.html) As far as I can see in youtube search no youtube video on degenerate dimension tables in Power BI, and very limited content on degenerate dimension in general..
@@GuyInACube Yes, that works (it's what you do in the video) and it's a step forward in dealing with the whole situation. It would be interesting to keep developing your example, adding some other real-world challenges related to models with multiple star schemas in PBI.
Wouldn’t a bridge table work? You could have a bridge table with the degenerate dimension (I love this term by the way). That way it could be a many to one from orders fact table to degenerate bridge table, and then one to many from degenerate bridge to shipping fact table.