Тёмный

Rethinking Orchestration as Reconciliation: Software Defined Assets in Dagster | Elementl 

Data Council
Подписаться 37 тыс.
Просмотров 7 тыс.
50% 1

ABOUT THE TALK
This talk discusses software-defined assets, an approach to orchestration and data management that makes it drastically easier to trust and evolve data assets, like tables and ML models.
In traditional data platforms, code and data are only loosely coupled. As a consequence, deploying changes to data feels dangerous, backfills are error-prone and irreversible, and it’s difficult to trust data, because you don’t know where it comes from or how it’s intended to be maintained. Each time you run a job that mutates a data asset, you add a new variable to account for when debugging problems.
Dagster proposes an alternative approach to data management that tightly couples data assets to code - each table or ML model corresponds to the function that’s responsible for generating it. This results in a “Data as Code” approach that mimics the “Infrastructure as Code” approach that’s central to modern DevOps. Your git repo becomes your source of truth on your data, so pushing data changes feels as safe as pushing code changes. Backfills become easy to reason about. You trust your data assets because you know how they’re computed and can reproduce them at any time. The role of the orchestrator is to ensure that physical assets in the data warehouse match the logical assets that are defined in code, so each job run is a step towards order.
Software-defined assets is a natural approach to orchestration for the modern data stack, in part because dbt models are a kind of software-defined asset.
Attendees of this session will learn what it looks like to build and maintain a warehouse or data lake of software-defined assets with Dagster.
ABOUT THE SPEAKER
Sandy is a software engineer at Elementl, building Dagster. Prior, he led machine learning and data science teams at KeepTruckin and Clover Health. He's a committer on Spark and Hadoop, and co-authored O'Reilly's Advanced Analytics with Spark.
ABOUT DATA COUNCIL:
Data Council (www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: / datacouncilai
LinkedIn: / datacouncil-ai
Eventbrite: www.eventbrite.com/o/data-cou...

Наука

Опубликовано:

 

29 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 5   
@fiannafailgalway8446
@fiannafailgalway8446 2 года назад
This was an excellent talk.
@Fat1Dada
@Fat1Dada Год назад
Very clear !!!
@Fat1Dada
@Fat1Dada Год назад
23:26 "There's some important bathwater that we shouldn't throw out with the baby" xD that's a cute slip-up there, I think most people evolve from babies, therefore they would agree bathwater is the most disposable "asset"
@user-if2kq8nh8m
@user-if2kq8nh8m Год назад
The audio clipping was rough on this video, nonetheless great presentation!
@BenOgorek
@BenOgorek Год назад
I think I might be sold
Далее
The Modern Stack for ML Infrastructure | Outerbounds
41:43
Как вам наш дуэт?❤️
00:37
Просмотров 1,2 млн
Airflow Vs. Dagster: The Full Breakdown!
14:51
Просмотров 6 тыс.
Branchless Programming in C++ - Fedor Pikus - CppCon 2021
1:03:57
Why You Shouldn’t Care About Iceberg | Tabular
20:26
Malloy An Experimental Language for Data | Google
34:24
What All New Software Developers Need To Know
27:46
Просмотров 132 тыс.
ЗАБЫТЫЙ IPHONE 😳
0:31
Просмотров 20 тыс.