Guy in a Cube. I Created a Dataflow in the service, i connected my model to the datasource. When i publish it to the service, power BI created another dataset. why does that happen?
Great way to showcase data flows! I love this idea of a stopgap solution, I've encountered many times in the past when report development is stalled exactly because the time it takes to make the changes to the data warehouse takes a long time... Great product!
Really useful video. Thank you. Good to consolidate Matthew's blog posts and answer a few more questions. Just to confirm access to the workspace defines access to the dataflow within the desktop.
I am so glad I saw this video, this is the best solution for people struggling with direct query databases transformations in power query. Now everything will be run in the cloud. And I won’t have to wait until the query is done to make transformations etc! Thanks 🙏
Great Video. Thanks for the detailed explanation. With Data flows implementation ,entire data is stored in two place. One in original Data source, second in the data lake storage created by data flow. Is it advisable to go with data flow with this point consideration..?
Great video, thank you| not too sure if this is the right section to ask in ... Is there a significant difference in using `PowerBI Dataflows` vs `Dataflows`, the M script is different but are there data load differences or perhaps performance differences (..basically beyond the M code is there any difference)? Thank you!
Thx Adam and Patrick very informative. So many choices Dataflows, ADF, Synapse, ... Could we have a video that talks about this domain and best use with Power BI?
I use dataflow to avoid collapsing a SQL db, a SP runs daily to feed my DF with +9M rows, my POBI report it's connected to it and mashed up with other sources like sharepoint so I can update it 6 times a day without actually going to the sql db, making it smooth process with little chance to timeout errors
What if I want to make a data flow but use it across multiple workspaces(different power BI apps)? When I previously tried to share datasets I built across workspaces, it didn’t update after the initial load outside of the parent workspace.
Can you make a video on how to leverage the advanced compute engine in Dataflows to improve performance? As I understand, similar to query folding, not all transformations hit the Azure SqlDB cache. Is there a way to know if a particular transformation uses the new compute engine or not?
Hi Akhil, Check out the below video on the upcoming Advanced Compute Engine Capabilities ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-8-DM48wq2DU.html
Just wondering... For instance, if I'm sourcing tables from DW and boosting them up with extra information from other sources as csv files, is there a way to maintain relationships within these new "boosted" tables (as they had in the DW)?
Hi Matt. We have uploaded the sales data from sql server and employee data from csv in dataflow and wants to set up the relationship here. Where we can set the relationship in dataflow?
I believe Row level security (RLS) is a feature of the data modelling experience that is created in a report. The dataflow exists outside of that experience and has no row-level security applied to it. However, when you incorporate a dataflow into a report, you can then apply RLS as required by your report model. You can actually connect to several dataflows at a time; think of them as just "enhanced" data sources so RLS would most likely not be reasonably feasible to try to use on the dataflow itself.
I have tried using data flows to connect to a view in Google Big Query but I seem to only be able to connect to the raw table. Are we only able to connect to tables for big query connections or should there be a way for me to connect to these views?
Patrick and Matthew--thanks for this very helpful explanation of dataflows. I am still trying to wrap my head around them. Question for you (or anyone else): how do dataflows interact with the Common Data Service? Is the CDS just one more source for a dataflow?
Our understanding is that the Common Data Service is a bit more specific to Dynamics / PowerApps. Power BI dataflows is on the Power BI side. They both make use of the common data model though. And can share the underlying data across by way of Azure Blob storage and the underlying CDM folders / json structure.
And here's something cool--I apparently watched this 10 months ago, and got lots out of it. But now I am re-watching, and getting a whole new level out of it.
Sounds to me it works almost like a view. It created the background code (ETL) to link data sourcing and create derived variables for multiple users to use (instead of these users having to pull in the raw data and try to emulate the joining of tables.
To create a dataflow reading from on-prem data sources, the creator of the dataflow needs to be the owner of the on-prem gateway as well? or is it enough just to be a user of the gateway?
If I use dataflow, will I be able to connect to the "data warehouse" I created with my other operational systems? Or do you recommend using azure data flows in that case? I don't want to build something which I can only access with PowerBI (or other Microsoft only tools). Thanks for your thoughts.
Hey patrick, i have a question here, i am new to power BI, If i will use shared data sets from dataflow or from app in to desktop, i am not able to see the tables or model, is it only for drag and drop for some visualization, I am asking this because how i can write effective DAX without seeing the actual contents and model. may be i want to check something on DAX studio also.
So, for sure that reading fron Azure SQL Datawatehouse will be faster than reading from a Dataflow.. But, will Dataflows be faster than reading from a Azure SQL database ?? I have an Azure SQL database (as a DW), should I switch my ETLs to Dataflows in order my PBI Datasets refresh times be faster?
Thanks for this overview. Do we have to have premium capacity for any of this dataflow functionality? It wasn't mentioned, but I have learned to be skeptical. Also, while not intended to replace the enterprise data warehouses, does it make sense for departmental uses?
So i have a question - when you combine tables like that, being that both of them are refresheable SQLqueries, does the dataflow understand it needs to combine both queries again every time I refresh, since we usually have new data being added everyday to our databases?
Think of a dataflow as the Power Query part of report design. You are simply separating this part out from the front end model building that is done in PBI Desktop. This allows you to use that ETL work for multiple reports, not just the one that you designed it in if you performed it in one report in Desktop.
I really like the idea behind data flows, however i am baffled how long these take to refresh! I have a few tables from an onprem SQL server as my source. I also have a conventional dataset based off the very same source, this one takes about 1 or 2 minutes to refresh. The data flow however takes at least 8 minutes! How is that possible? Both use the same up-to-date enterprise Gateway... any thoughts? Thanks for the great content on this channel btw!
why do we need dataflow when you do the same using PBI desktop? For example, the tables you imported from DB and an excel can be connected within PBI desktop and if transformation is needed, we can do the same in desktop too..so could you please explain me when and where we use dataflows?
Hi I have a query for you I have five tables in my data model if I change the existing relation using userelationship function on one connection, How can I get the child data from the other tables using the new connection? It is an interview question
Can I link this dataflow to the dashboard already published in power service? Those are same datasources used in report and dataflow. I can change the link in desktop but is there any way to change the link in power service?
Hi Patrick, i have a doubt if you are free or feel free to reach me. my doubt is how to change the imports such that the initial load is via dataflow and not straight into the dataset.
Why in my Power BI is there no Dataflows screen? It just stops at Datasets. What am I missing? EDIT: I looked, and it looks like my Gateway isn't properly set up. If I get my Gateway fixed, will I be able to see Dataflows as one of my options at the top of the screen at 1:51?
Hi. Does this mean that one can now fully rely on Dataflows + Power BI Service and have reliable automatic dataset refresh instead of complicated Power BI Desktop + Gateway configuration ?
Not really. The downside is that if a regular dataflow refresh fails (and they do periodically for bizarre reasons and with inpenetrable error messages), then all dependent PBI Desktop refreshes will use the previous version of data from the dataflow. If your Desktop mashup is reliant on that refresh working (e.g. for table relationship constraints), then it will also fail.
Yes. They are independent as of the recording of the video. Refreshing the dataset will just grab the latest in the dataflow. If the dataflow hasn't been updated, it will just grab the same information. It will not trigger a downstream refresh.
Hi there, can I run R scripts on Power BI Dataflows before pulling the data to PowerQuery? The SQL Server I pull data from does not allow R scripts and I do not want to use R in Power Query because it does not allow to use of Enterprise Gateway to refresh data. I landed here because I hope for a solution that allows me to run R somewhere between SQL and PowerQuery.
Assuming the data is not already available in a centralized data warehouse, I would say both. 1. Create dataflows for common dimensions such as Date, Customer, Product, etc. that would be used across multiple models (datasets). 2. Then create datasets for specific business areas (i.e. sales, finance, HR, etc.) that would connect to the entities created from dataflows. This setup would allow your users to benefit the most as you would share the same "Date/Customer/Product" views across the entire organization. Hope this helps!
Potentially. Your mileage will vary depending on what you are doing. But if you did all the transformation on the dataflows side, and then just did a straight import into Power BI, the work of doing the transformations has been done. So, no need to do it on the Power BI side. So, maybe. 😁
Can MSFT make it so that a dataflow can be owned by a group!? When we need to change a df you have to change the owner each time if you're not the owner. It can be such a pain!!!
I understand the value of centralizing all data and model schema inside a Dataset and then build consistent reports using this Power BI dataset. But, this is where I am lost. I create a DataFlow, then I have to once again define a Dataset (import mode) and then define relationships and then create reports on this Dataset. So - what did I gain using a DataFlow?
Thanks for this great video. You asked for a questions/comments? You got it :-) Agree that dataflows are useful tools. But actually I'm struggeling with the case that I need to pull the data from external API day by day. I cannot do bigger request because of limitations. I have created a power query scripts which pulls and transforms data. For this I used RangeStart and RangeEnd parameters and can pull data for one defined day. Rest of the data I wanted to pull using incremental refresh. But now, I cannot set up incremental refresh policy because it says that "Mashup document contains duplicate query name: RangeStart". So, as I read, I don't need to use RangeStart and RangeEnd parameters because dataflows uses datetime column for this automatically. So, how I should then pass those parameters to my GET request? How to set up incremental refresh for this? I was so close and now I'm stuck. Any idea?
Hello sir, I have a AD group let's say O365SPM that contains list of some users, I am using SP as data source to create Power BI report. And in that SharePoint I gave access to this group as members. I want to fetch those user inside powerbi in some forumula. Can you guide me what should I use Or process to do As in powerapps we can use O365 connector to get user list. How to do in BI?
Excellent article! Can I say now it"s possible to use my MacBook to develop my Reports and Dashboards in the same way when I am using Power BI Desktop in a PC laptop? I believe it"s an important difference for Tableau users. Thanks and have a great week ahead!
When I try to do a "FuzzyNestedJoin" I get this error in Data Flows: "The name 'Table.FuzzyNestedJoin' wasn't recognized. Make sure it's spelled correctly" Here is what I am doing......... Table.FuzzyNestedJoin(#"Expanded Exclude", {"Account_Name"}, HPG_Roster, {"Name2"}, "HPG_Roster", JoinKind.LeftOuter, [IgnoreCase = true, IgnoreSpace = true, NumberOfMatches = 1])................ Any help would be amazing!
Nice Angelo-Saxon Futhark T-Shirt Matthew. Yeah, I can read runes... Kind of a big deal 😂 It doesn't say anything, it's the runic alphabet (Fee, Ur, Thorn, Ansur, Rad, Ken)
I asked myself the same question :). I think no because it depends of your business scenario. Datasets have more complex objects like relationships, measures, Hierarchies, etc. And it is read only I think. You can use Dataflows as a data source for your dataset. A video on the difference will be great anyway :).
Actually, they really complement each other. Think of it this way, say you have a data source that requires a lot of ETL to get to a finished format, perhaps by combining tables from that source, simplifying data, cleaning it. You'd like to use that transformed data source in a dataset your building that you want to use for one or more reports that serve one purpose, but you also want to use that transformed source for an entirely different set of reports. Step A is to build the dataflow and Step B is to then build your multiple shared datasets that use that dataflow and Step C is to use those shared datasets to build your reports. In this setup, the transformations on the dataset only get performed one time. This can be especially valuable because the designer of the dataflow may better understand how to translate the source data into a format that report authors can make use of, kind of like creating a View in a database. The tough part, as I see it, is how to keep those pieces all refreshed in synch.
Love it, now we have to bright all analysts to the same Power Bi understanding level. :/ I can see myself trying to explain some analysts their data will be saved.
Question - first, thank you. I was trying to research differences between datasets and dataflows and this video helped the most. I'm quite new to PBi - and just have to ask (assuming anyone is still monitoring this) the entire demo leading up to the 9 minute mark, where Matthew is (most excellently) demoing building the dataflow - he says things like being able to mash up tables and bring in other datasources, etc etc. Well, you can do all of that in a regular dataset using regular PBi Desktop, correct? If I'm to understand this properly, really - the key difference is that the flow is stored in the cloud and available for re-use to those in a workspace, am I correctly stating that? EDIT: 2 quick follow-up questions if I'm correct above: 1) Can I create the PowerBI dataflow in Desktop? I would assume I can create the flow using Power Query as normal - but rather than it being a Dataset - I push it to the Cloud (after signing into the service?) 2) Can I pull Dataflows into my pbix file if I'm just using PBi desktop (assuming yes-so long as I log into the workspace? THANK YOU!!!
Right, key use case of Dataflows is that it can be reused by multiple users across multiple reports. Dataflows can be created only from the service. You create a flow in desktop and push it to cloud - it is a "Power BI Dataset", which you can again read from Power BI Desktop. To me, Data flows seem to be an improvement over the Power BI dataset and addresses the actual use case that a Power BI Dataset should have done. If someone can give an opinion on Power BI Datasets vs Dataflows. Are Power BI Datasets redundant now?
I strongly believe that power bi is still catching up with enterprise data warehousing modeling where we have confined dimension. data flows are a way to build confined dimension but with a twist. i have a dataset in workspace w1,w2 and w2 lets say customer. i create a common dataflow to load customer table which resides in adsl gen2. then i use customer table in three model (data marts ). i still have three customer dataset in w1,w2 and w3. which take valuable vertipaq storage memory. how can it possible since a customer is a common dimension. it should have only one dataset which is being share amonst three workspace . please guide and elobrate.
If it takes two highly skilled professionals 13 minutes to explain what it is you need to work on that. How about a diagram? How about a simple comparison table datasets/ dataflows/ datawarehouse pros and cons. When explaining it to clients I end up using "sort of", "maybe" instead of "it is" or "Yes". Is there a definitive Elevator pitch? Do you have some stone cold use cases instead of "maybe if this happens" and "maybe you don't have that" and "maybe your IT is crap". Links welcome.....