Your data is in the Lakehouse, but now what? | Microsoft Fabric (Public Preview)

Подписаться 442 тыс.

Просмотров 33 тыс.

50% 1

You've got your data into OneLake and a Lakehouse, but now what? What can you do with that data after you've landed it in Microsoft Fabric? Justyna walks us through different areas where you can leverage your data throughout fabric. From data warehouses to even Power BI!
What is Data engineering in Microsoft Fabric?
learn.microsoft.com/fabric/da...
What is Spark compute in Microsoft Fabric?
learn.microsoft.com/fabric/da...
Develop, execute, and manage Microsoft Fabric notebooks
learn.microsoft.com/fabric/da...
OneLake shortcuts
learn.microsoft.com/fabric/on...
Justyna Lucznik
/ justynalucznik
/ justyna-lucznik
📢 Become a member: guyinacu.be/membership
*******************
Want to take your Power BI skills to the next level? We have training courses available to help you with your journey.
🎓 Guy in a Cube courses: guyinacu.be/courses
*******************
LET'S CONNECT!
*******************
-- / guyinacube
-- / awsaxton
-- / patrickdba
-- / guyinacube
-- / guyinacube
-- guyinacube.com
**Gear**
🛠 Check out my Tools page - guyinacube.com/tools/
#MicrosoftFabric #Lakehouse #GuyInACube

Наука

Опубликовано:

29 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 42

@powerranger3357 Год назад

Great video. Would like to see/walk through what the process is of loading those initial FAct/Dim tables that are being connected to in the video. The statement was made that no ETL/Data movement needs to be completed (and while thats true for the BI developer), I feel like its not an accurate statement when looking at the end to end process.

@adolfojsocorro Год назад

I also think this is a confusing statement made in several videos and docs. The data has to somehow initially get to the lakehouse and subsequently refreshed on some schedule. I don't think Fabric eliminates those ETL processes.

@GuyInACube Год назад

The comment about no ETL/Data movement is true after you get it into OneLake. Then that one copy of the data can be reused across the different engines like was shown in this video. We did a video also about how to create your first lakehouse. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-SFta1_70T_U.html. I also recommend going through the end to end tutorials that are in the Fabric documentation.

@prasadparab5638 Год назад

Awesome.... great features... Thanks a lot for sharing this information 🙏👏👏👍

@GuyInACube Год назад

Most welcome! Thanks for watching 👊

@dogsborodave Год назад

Hey! I remember Justyna from a breakout session at Build! Love GIAC that much more.

@rohtashgoyal 8 месяцев назад

Liked the option of connecting with lake in direct query mode and getting performance of import mode.

@tuyetvyvu4638 Год назад

Did anyone have the same problem. I had created a lakehouse with delta table, then I created report from default dataset. However, when I ran dataflow, data appeared in the lakehouse but my default dataset did not refresh

@lercosta01 Год назад

Pretty awesome!

@GuyInACube Год назад

We appreciate that! Thanks for watching 👊

@hannesw.8297 Год назад

How does the Fabric accomplish the same speed of a "classic" PBI import model compared to the new Direct Lakehouse mode? Doesn´t this require the whole lakehouse to be in memory upfront? And, of course, thanks for the video!

@juniorholder1230 Год назад

EXACTLY what I'm not seeing being answered.

@GuyInACube Год назад

Data still needs to be in memory for Direct Lake datasets, however the engine tries to load just columns and not full tables based on the query patterns. But, at some point you could bump into limits based on your SKU and how much memory is available. Still work to be done on better paging and what not. In general, on my end, I still think of Direct Lake datasets as import datasets from a Power BI perspective. It's just way faster to load up the data and you don't have to refresh or make copies of the data.

@shahzadkheros 10 месяцев назад

great video

@ivankhoo93 Год назад

Hi, great video on the lakehouse, have a question, how about the data loading experience of loading the e.g 3.5 bn rows into the lakehouse initially? Would assume that loading 3.5 bn rows of data into the lakehouse would take some time although we can see the experience of using that data from the lakehouse is rather smooth which is great.

@GuyInACube Год назад

The loading of the data is covered in the creating your first lakehouse video - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-SFta1_70T_U.html, as well as the OneLake video - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wEcRTSNhtLg.html. There are different ways to get data into OneLake - Data Factory Pipelines, Dataflows Gen2, shortcuts to existing storage accounts, etc...

@The0nlySplash 3 месяца назад

Our all microsoft company just switched to a databricks lakehouse and now all of a sudden MS is offering their own product. Damn this looks good

@curiousjoe395 4 месяца назад

How do I have a DW table showing up in the Lakehouse Endpoint please?

@francisjohn6638 Год назад

This is cool :)

@GuyInACube Год назад

We agree! 👊

@eziola Год назад

What was the process to create a relational data model with custom columns and measures in the OneLake? I normally create the data model, columns, relationships and measures in Power BI desktop. How is a data model created using only OneLake for Power BI to connect to?

@hilldiver Год назад

Coming from the Power BI side of things one point that's not so clear is how the OneLake "one copy" concept works with the lakehouse medallion architecture. As an example I have a dataflow gen2 which loads Contoso from a local SQL server to a lakehouse with no transformations, this is the bronze/raw data layer. If I then want to do some transformations, e.g. merging product category and subcategory into product table to help with star schema, how do I do this? Shortcuts don't help as far as I can see, as they're just a virtualisation of what exists. According to the introduction page of the end-to-end lakehouse tutorial it says "Create a lakehouse. It includes an optional section to implement the medallion architecture that is the bronze, silver, and gold layers." but this is the only mention I've been able to find?

@culpritdesign Год назад

I think you can use spark, or stored procedures (or databricks if you're going that route) to shape your data into dimensional models. I think that's why they showed the spark notebooks, even if they did not explain it very explicitly. You can see at 2:33 there is a stored procedure section in the Lakehouse. I am learning this now too. I am used to using stored procedures but I want to learn Spark and Databricks.

@TheGreyFatCat Год назад

impressive response time using direct lake with that substantial row count - will the fabric capacity SKU impact this? what SKU was being shown in this demo?

@GuyInACube Год назад

Even with a flow SKU like an F2, you will see fast response times. Where your mileage will vary is around how much compute capacity you have. So with the lower SKUs, you will bump up against the limit faster and then start to encounter throttling. Although, right now, the new workloads don't count towards your Capacity Unit limit until August 1, 2023. So, you won't get throttled. This is a good time to test what it looks like to gauge how much capacity you may need.

@scottbradley1194 Год назад

You mention connecting Excel to a lakehouse. What are the options to do this other than a Power BI dataset?

@gabrielmenezes3967 Год назад

How do we handle security on DirectLake datasets?

@stevefox7469 Год назад

Good question. e.g Row Level Security

@GuyInACube Год назад

One Security is really going to give you the ability to handle that at the OneLake level which will carry through to the different engines. Still waiting to hear news on when that will be available. That will be the answer with regards to Row Level Security.

@angmathew4377 Год назад

Is data store in SqlServer or is it File storage lile csv, etc and query file for powerBi reports?

@GuyInACube Год назад

Check out the OneLake video we did with Josh Caplan - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wEcRTSNhtLg.html Data is stored in Delta Parquet files.

@geoleighyers 8 месяцев назад

Q: Yeah, somehow when I try to update my data in Lakehouse, the tables on PowerBI do not refresh. Is there a limit of spark queries for Microsoft Fabric Trial?

@rushankpatil Год назад

How does it affect the CPU usage in Powerbi premium? with 3 billion rows

@GuyInACube Год назад

It depends. As always, it depends on data structure, data types, query patterns, etc... best way to know how it will work with your data is to test it. Right now, is a great time also as the new workloads aren't counting towards your Capacity Unit limit until August 1, 2023. So you won't be throttled as a result.

@ahsankhawaja474 Год назад

without RLS really it just POC, RLS will force this to direct query mode from direct lake mode, so we have to wait till One Security is out

@GuyInACube Год назад

I'm really looking forward to One Security. Also, I know the team is looking at how to optimize Direct Lake based on feedback so make sure you get that in at aka.ms/fabricideas if you have thoughts.

@Enidehalas Год назад

So you are saying Fabric preview does not support RLS at the moment ? Also, I have never heard about One Security yet, nor found anything relevant on google; Where can I find more info ?

@adolfojsocorro Год назад

Isn't the instantaneous nature of Direct Lake similar to today's direct connections to datasets?

@GuyInACube Год назад

No. It's different architecturally. Think of this still like an imported dataset, but the storage engine is loading the data from the delta parquet files within OneLake. And it's really fast at doing that.

@adolfojsocorro Год назад

@@GuyInACube to the report itself, isn't it the same? It's just a connection to a dataset that, to it, is transparently updated. I love that it's faster, but to the report, functionally speaking, it's the same.

@Enidehalas Год назад

@@adolfojsocorro Unlike DirectQuery, you don't lose some functionality (DAX functions, parts of RLS etc)