Why OneLake is a BIG DEAL in Microsoft Fabric (with Pawel Potasinski)

Подписаться 30 тыс.

Просмотров 1,9 тыс.

50% 1

One of the key pillars of Microsoft Fabric is that it’s lake centric and open. At the core of this pillar there’s OneLake - a single, unified storage system for all your data. In this conversation Pawel and Reid will discuss and ** demo ** the benefits of OneLake for data professionals and organizations working with Fabric.
GUEST BIO 👤
Member of the Microsoft Fabric Customer Advisory Team (CAT), focused on community management and engagements. In his professional career Pawel has always been associated with data engineering and analytics (SQL, BI, Big Data). Founder of the Polish SQL Server Users Group (PLSSUG), today known as Data Community Poland. Regular speaker at conferences, community events and user groups. Former Microsoft Most Valuable Professional (MVP).
RELATED CONTENT 🔗
LinkedIn: / pawelpotasinski
Twitter: / pawelpotasinski
OneLake documentation: learn.microsoft.com/en-us/fab...
Fabric blog: aka.ms/fabricblog
Fabric community: aka.ms/fabriccommunity
Fabric ideas: aka.ms/fabricideas
Fabric readiness repository: aka.ms/fabric-readiness-repo
VIDEO CHAPTERS 📺
0:00 - Video Start
5:00 - Start of Livestream
LET'S CONNECT! 🧑🏽‍🤝‍🧑🏽🌟
-- / havensbi
-- / reidhavens
-- / havensconsulting
CHECK OUT OUR MERCH STORE 👕
-- havens-consulting.creator-spr...
HAVENS CONSULTING PAGES 📄
Home Page -- www.havensconsulting.net
Blog -- www.havensconsulting.net/blog-...
Blog Files -- www.havensconsulting.net/blog-...
Files & Templates -- www.havensconsulting.net/files...
Consulting Services -- www.havensconsulting.net/consu...
Online Course -- www.havensconsulting.net/onli...
Contact & Support -- www.havensconsulting.net/conta...
EMAIL US AT 📧
info@havensconsulting.net
#PowerBI #powerplatform #microsoft #businessintelligence #datascience #data #dataanalytics #excel #powerapps #datavisualization #dashboard #bi #analytics #dax #powerquery #onelake #lakehouse #fabric #datawarehouse

Наука

Опубликовано:

29 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 3

@noahhadro8213 8 месяцев назад

Unfortunately Direct Lake mode is not near as fast as Import mode. We ran a test where we used Data Flows GEN 2 to bring in a fact table with 30M records and one dimension into the fabric lake house from SQL on prem. We created a dataset from that using direct lake mode. Then we created another dataset with the same fact table and dimension using import mode. We did a simple SUMX calculation in both datasets. We ran the query several times clearing the cache for both before we ran each time. The Import mode ran twice as fast. 408 ms imported vs 934 ms direct lake mode. Is this what you are experiencing?

@HavensConsulting 8 месяцев назад

Message from Pawel 🙂 Thanks for this comment and sharing the result of your test! My general comment is that our "north star" for Direct Lake is that it would be as fast as Import mode, but for now it's not always like that, especially "out-of-the-box". Two thoughts related to your test: 1) Dataflows Gen2 used to be known of generating nonoptimal Delta tables (see this blog post by Sandeep Pawar: fabric.guru/fabric-not-all-delta-tables-are-created-equally). It's still in Public Preview! :-) 2) Have you made sure your Delta tables are optimized by using V-Order (see learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparksql)? There is a super easy way to optimize specific Delta table directly from Fabric UI - simply right-click on your table and select the Optimize option. In addition, here's another great blog post from Sandeep on how to check if Delta table is V-Order optimized: fabric.guru/checking-if-delta-table-in-fabric-is-v-order-optimized.

@muppetbaer 8 месяцев назад

Direct Lake is marginally slower vs import (for now), but, for me, the use case for Direct Lake is in allowing dataset complexity that import cannot even approach. Yeah, it's 0.5 seconds slower over 32M table with a single dim. How about trying to run a model with 10x 250M fact tables, 50ish partitions, multiple pipelines upserting data non-stop throughout the day, a dozen of derivative models serving 100ish users each, and syncing to warehouse mirrors using shortcuts. Try doing that in import.