Dagster

Dagster

133
245 079

Подписаться

Ship data pipelines with extraordinary velocity with Dagster.

Dagster helps data engineers tame complexity. Elevate your data pipelines with software-defined assets, first-class testing, and deep integration with the modern data stack.

Dagster is a cloud-native open-source orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.

Dagster+ Launch: Welcome

4:33

Dagster+ Launch: Welcome

Месяц назад

Dagster+ overview

7:26

Dagster+ overview

3 месяца назад

Dagster+ implementation partner update

2:10

Dagster+ implementation partner update

3 месяца назад

Dagster+ Insights

6:28

Dagster+ Insights

3 месяца назад

Dagster+ Branch Deployments with Change Tracking

3:16

Dagster+ Branch Deployments with Change Tracking

3 месяца назад

Dagster+ Data Reliability

6:31

Dagster+ Data Reliability

3 месяца назад

Dagster+ Data Catalog

5:14

Dagster+ Data Catalog

3 месяца назад

Dagster+ launch event: wrapping up

0:58

Dagster+ launch event: wrapping up

3 месяца назад

Introducing Dagster+

29:51

Introducing Dagster+

3 месяца назад

Dagster+ - Join the launch event on April 17th

0:45

Dagster+ — Join the launch event on April 17th

3 месяца назад

Dagster and the Data Mesh (A Dagster Deep Dive)

43:47

Dagster and the Data Mesh (A Dagster Deep Dive)

3 месяца назад

Pipeline Sandboxing: Boosting developer productivity with LakeFS & Dagster

1:18:17

Pipeline Sandboxing: Boosting developer productivity with LakeFS & Dagster

4 месяца назад

Thinking in Partitions (A Dagster Deep Dive)

34:50

Thinking in Partitions (A Dagster Deep Dive)

4 месяца назад

Dagster Demo Recording

35:03

Dagster Demo Recording

5 месяцев назад

Configuration & Resources (A Dagster Deep Dive)

33:51

Configuration & Resources (A Dagster Deep Dive)

5 месяцев назад

Flexible Scheduling with Automation in Data Engineering (A Dagster Deep Dive)

32:07

Flexible Scheduling with Automation in Data Engineering (A Dagster Deep Dive)

5 месяцев назад

Exploring the new Dagster UI

6:32

Exploring the new Dagster UI

6 месяцев назад

Dagster Shorts: Thinking in Partitions

7:33

Dagster Shorts: Thinking in Partitions

7 месяцев назад

Data Quality as part of the Data Pipeline

23:08

Data Quality as part of the Data Pipeline

7 месяцев назад

Dagster's run UI & debugging features

6:00

Dagster's run UI & debugging features

7 месяцев назад

Why data engineers are moving their data pipelines off Airflow and onto Dagster

7:31

Why data engineers are moving their data pipelines off Airflow and onto Dagster

7 месяцев назад

Building a trusted and productive data platform with Software-defined Assets - a fireside chat.

21:23

Building a trusted and productive data platform with Software-defined Assets - a fireside chat.

7 месяцев назад

Embedded ELT: Save your budget and simplify your data platform with Dagster Embedded ELT.

17:01

Embedded ELT: Save your budget and simplify your data platform with Dagster Embedded ELT.

7 месяцев назад

How sanas.ai runs neural network inference on millions of audio files.

23:17

How sanas.ai runs neural network inference on millions of audio files.

8 месяцев назад

Anomstack: a lightweight app to detect anomalies in your data.

13:30

Anomstack: a lightweight app to detect anomalies in your data.

8 месяцев назад

Escaping the Modern Data Trap -- Dagster Launch Week - Fall 2023

28:02

Escaping the Modern Data Trap -- Dagster Launch Week - Fall 2023

9 месяцев назад

Introducing External Assets and Dagster Pipes -- Dagster Launch Week - Fall 2023

38:38

Introducing External Assets and Dagster Pipes -- Dagster Launch Week - Fall 2023

9 месяцев назад

Introducing Dagster Asset Checks -- Dagster Launch Week - Fall 2023

15:33

Introducing Dagster Asset Checks -- Dagster Launch Week - Fall 2023

9 месяцев назад

Introducing Dagster Insights -- Dagster Launch Week - Fall 2023

10:55

Introducing Dagster Insights -- Dagster Launch Week - Fall 2023

9 месяцев назад

Комментарии

@alirezaghaffari16 4 дня назад

badly explained

@alirezaghaffari16 5 дней назад

instead of dagit install dagster-webserver from now on

@ledinhanhtan 6 дней назад

I love using Dagster, especially when I can juggle the assets in my mind before writing any code. With Airflow, I have no mental picture of that.

@kosmylo 26 дней назад

What is the difference between dagit and dagster-webserver?

@ShaneZarechian Месяц назад

Is FreshnessPolicy being deprecated?

@JohnCF Месяц назад

Came here excited to learn about new features in the latest Dagster version. But it looks like you've decided to widen the feature-gap between the open source offering and the enterprise offering... even though this will be a maintenance burden on your team... causing delays in "backporting" features and bugfixes to the open-source version going forward. Kinda disappointed...

@dagsterio Месяц назад

Thanks for the comment @JohnCF. If you go through the enhancements introduced with this Dagster+ launch, you will see that many of them (in fact, all of them except for Dagster Insights) benefit both the open-source and the commercial offerings. The data cataloging capability is a good example of that. From our perspective, these new additions are moving us forward on both the OSS and the Dagster+ roadmaps. In addition, by providing more value to those organizations that adopt Dagster+ we are able to guarantee the longevity and accelerated development of Dagster Open-Source.

@JohnCF Месяц назад

@@dagsterio Does that mean what's mentioned at 7:15 about column lineage is available in open-source too? The phrasing definitely sounded like it's only available for Enterprise users...

@dagsterio 29 дней назад

@@JohnCF Correct. Column level lineage is a Dagster+ feature and is not available in Dagster Open-Source.

@jordanfox470 Месяц назад

Is there native support for mapping time based partitions to static partitions defined like "today", "rest of month", "rest of year", "rest of history"? This is a common setup for power bi datasets, which can be represented as assets in dagster. Would be nice to take advantage of auto materialize policies.

@dagsterio Месяц назад

Dagster does not natively support mapping time-based partitions to static partitions like "today," "rest of month," "rest of year," and "rest of history" directly out of the box. However, you can achieve similar functionality by defining custom partitioning schemes and using the appropriate partition mappings. You can define custom partitions using StaticPartitionsDefinition for static and TimeWindowPartitionsDefinition for time-based partitions.

@tobiaspucher9597 Месяц назад

Awesome!!! Please more!

@flogzer0 Месяц назад

I'm fairly sure this sales guy never used Airflow

@AbhishekAgrawal-dv1id 2 месяца назад

If the requirement is to get the data from S3 files into a BQ table but perform some validations on those files before inserting into the table, how would we do it with Embedded ELT? We are using Dagster OSS heavily and looking to use embedded-elt for getting data from files, tables and APIs..

@tim-at-elementl 2 месяца назад

Hey Abishek! In your case, would you be able to represent the S3 files as source assets first, adding asset checks onto those, and running Embedded ELT only if those asset checks pass? Sling currently (afaik) is heavily focused on doing ingestion well, so you can defer to the rest of the Dagster ecosystem (such as asset checks) for validations.

@AbhishekAgrawal-dv1id 2 месяца назад

@@tim-at-elementl Thanks, Tim. How would you rate dlt for my use-case? I see dlt is far more mature..

@tim-at-elementl 2 месяца назад

@@AbhishekAgrawal-dv1id we've found that dlt is a powerful framework for ingesting from APIs and it's definitely mature enough for production settings. I'll also say that neither Sling's or dlt's integration currently allow for creating asset checks in-flight during ingestion. Instead, have you thought about ingesting the files into a quarantined dataset first using whichever tool you'd like, applying asset checks to that, and then moving that data to your real "analytics-ready" BQ datasets once you've vetted the data? This way, you can do ad hoc analysis to understand why the data failed data quality tests easily, but also keep it isolated from your production analytics.

@AbhishekAgrawal-dv1id 2 месяца назад

Yeah, I am also leaning towards doing something like this. Thanks for this, Tim. Would you suggest using a similar approach to pull data from a different database? We'd still need to run minor validations on the incoming data, though. Would dlt help here at all?

@vikramtatke5930 2 месяца назад

As a person with just 2 years of experience my mind was blown watching this. I am a single person writing code in my department so I don't have any seniors to learn from but I'm leading a data engineering project that deals with terabytes of data and each request is multiple times larger than the server's RAM and multiple such requests need to be processed in parallel to complete stuff in time. Also, we have the tiniest possible budget to aggregate 25 to 30 columns and billions of rows every day. Also, we need to cut down on costs. This was super helpful.

@atulverma7783 3 месяца назад

anyone notice silicon valley reference in screenshots

@dagsterio 3 месяца назад

Yep. We are big fans. Enjoy the Easter eggs! ;-)

@JohnoScott 3 месяца назад

Seems to be an alternative to dbt docs and dbt Cloud Explorer?

@user-xh6tx8py1r 3 месяца назад

For some teams, definitely, although it can be complementary to dbt docs, because it sucks in some of the data via the dbt integration. Essentially becomes a super set of documentation

@armanuki38911 3 месяца назад

You lost me at "cloud".

@JohnoScott 3 месяца назад

Where is Nick Shrock ?

@dagsterio 3 месяца назад

Behind the camera, helping out with the teleprompter while recovering from an injury.

@JohnoScott 3 месяца назад

Wishing him a speedy recovery then. We miss him on RU-vid !

@schrockn 3 месяца назад

👋 Right here! I just happened to be unable to participate in the recording session for this. Team killed it!

@JohnoScott 3 месяца назад

@@schrockn yes they did. Keen to hear your take on all this Nick ; video from you soon ?

@fredguth1315 3 месяца назад

What does Dagster+ mean for the open source version?

@dagsterio 3 месяца назад

Many of the enhancements in the 1.7 release benefit all users (Open-source and paid Dagster+ users). In general, the open-source solution gains more capabilities with each release both to support open-source users and to unlock more capabilities in Dagster+ which are built on top of core.

@minimapai 3 месяца назад

Exciting

@Robay146 3 месяца назад

Great presentation.

@Amapramaadhy 3 месяца назад

Please update the gh repo when possible with the data mesh example. Multiple code locations seem super useful. Thanks

@dagsterio 3 месяца назад

Hi! I had to put it in a different repo to accommodate for running multiple code locations and not breaking our existing setup for the deep dive projects. The dedicated repo for the data mesh example can be found here! github.com/dagster-io/data-mesh-demo

@Amapramaadhy 3 месяца назад

@@dagsterio much appreciated

@Jesufemi_O 3 месяца назад

Hi Dagster team, great stuff here! I really enjoyed watching this!! is the demo code available in github?

@atkinsonr 4 месяца назад

nice talk but the slides are hard to follow on here. Would be better if recorded with autofocus off, and white balanced to the projector screen.

@danielbartley516 4 месяца назад

100%

@maxisqt 4 месяца назад

This is the coolest tech demo I've ever seen. I have wanted for so long to see an end-to-end analytics stack demo, or tutorial, and never found it. You just did it in 15 minutes, using free, open source tools I can run locally on my laptop. Absolutely incredible!

@dagsterio 4 месяца назад

Thanks. The Dagster capabilities are expanding with each new release.

@cornstarch4575 4 месяца назад

At around 8:20 you mention it's vulnerable to SQL injection - could I get more detail on that?

@zuesbenz 4 месяца назад

fk the learning curve on this shit.

@krtmlry2719 3 месяца назад

is it really hard? Im planning to learn this too. lmao

@quinnherden 4 месяца назад

That's interesting. Do you expand on this somewhere?

@dagsterio 4 месяца назад

You might find this blog by Sandy interesting: dagster.io/blog/dagster-ml-pipelines. - Otherwise you can listen to the entire Podcast featuring Sandy here: datastackshow.com/podcast/machine-learning-pipelines-are-still-data-pipelines-with-sandy-ryza-of-dagster/

@rembautimes8808 4 месяца назад

I work in a financial institution and there is definitely a need for a reliable and resilient data process. Look forward to finding out more about Dagster. I also agree, no point building something flaky and have it barf 🤢

@rembautimes8808 4 месяца назад

Yes I’m excited. Thanks

@hungnguyenthanh4101 4 месяца назад

link repo please.

@dagsterio 4 месяца назад

Sorry, one of our redirects got broken - here is the link: github.com/dagster-io/devrel-project-demos/tree/main

@dagsterio 4 месяца назад

More specifically for this session: github.com/dagster-io/devrel-project-demos/tree/main/dagster-deep-dives/dagster_deep_dives/resources_and_configurations

@jakobullmann7586 4 месяца назад

I don’t know… this video is one year old, but still uses the legacy DAG syntax from Airflow 1, rather than the TaskFlow API from Airflow 2. So the syntax doesn’t make a difference anymore. Regarding the coupling to environment: Airflow has different executors. The KubernetesPodOperator is not the only way to run on a Kubernetes environment. The rest may or may not be true. Probably there are many things that Dagster does better than Airflow. But I’m disappointed that you would publish such a biased comparison.

@user-yh1hx9pe7u 4 месяца назад

@dagsterio Do you have the source of the demo avaialble somewhere?

@dagsterio 4 месяца назад

All the code for the demos from the deep dives are in this repository ( github.com/dagster-io/devrel-project-demos )! This one in particular is in the partitions directory.

@Jahaniam 3 месяца назад

@@dagsterio unfortunately it is private/ link is broken.

@dagsterio Месяц назад

@@Jahaniam Sorry, the final parenthesis got included by RU-vid in the URL - try this: github.com/dagster-io/devrel-project-demos

@shaounakn 4 месяца назад

I like what you folks have done with this product.

@dagsterio 4 месяца назад

Thanks - there is a lot more in store coming next month!

@quinnherden 4 месяца назад

+1 I am rooting for you guys. Thank you for all of your hard work

@dagsterio 4 месяца назад

We appreciate it - thanks @@quinnherden !

@shaounakn 4 месяца назад

@@dagsterio Sure, thanks for making these sessions, these are really helpful.

@hungnguyenthanh4101 4 месяца назад

Please send me link Git repo on video

@dagsterio 4 месяца назад

Try: github.com/dagster-io/devrel-project-demos

5 месяцев назад

Joining other comments, I'd love to see more step-by-step tutorials and use cases. It took a few videos to grasp the concepts, and this one is a good one to start with. Docs are good, but videos are even better. I would love to see more of duckdb / dagster and ingestion cases.

@user-hs9lo5gh3r 5 месяцев назад

In 7:47 of the video you show using the Launchpad to configure assets... I can't figure out how to access this page?

@colton-dagster 5 месяцев назад

Hi @user-hs9lo5gh3r, the most common way to bring up this menu is to select an asset from the global asset lineage, and then in the top right where it says "Materialize selected...", open the dropdown menu and select "Open launchpad". Hope this helps!

@Amapramaadhy 5 месяцев назад

What’s with these shorts? Feels like a kid got hold of your social account! Stick to real content

@jesperbagge2504 5 месяцев назад

I really want to love Dagster but watching this video reminded me of why I stopped using Dagster for moving data from point A to point B. There are so, so many layers of configuration and plain infrastructure all over the place that kind of just needs to be there that the actual business logic (you know, the valuable part of the code that defines the data product) gets completely buried.

@Amapramaadhy 5 месяцев назад

IMO, one of the most confusing concept and unnecessarily convoluted item in Dagster (which is otherwise amazing). Eg what’s with RunConfig that has references to `ops` but then things have to be keyed/named by asset name. You totally glossed over the global config item (eg s3 bucket that is common to everyone ) then you have to use an awkward resource that doesn’t really do anything other than holds some fields (ahem config). I really wish this would get cleaned up.

@colton-dagster 5 месяцев назад

Hey @Amapramaadhy, what you’re expressing is totally valid. The concepts of Assets, Ops, and Jobs and how to compose them can be a bit convoluted - this has become more noticeable as our APIs evolve. We’re aware of this, and it’s on our roadmap to improve. Thanks for taking the time to respond and sharing your thoughts.

@huyhoangnguyen7465 5 месяцев назад

Manualy Run success But Schedule run fail

@huyhoangnguyen7465 5 месяцев назад

dagster._core.errors.DagsterInvariantViolationError: Cannot access partition_key for a non-partitioned run

@huyhoangnguyen7465 5 месяцев назад

Hi, I meet bug dagster._core.errors.DagsterInvariantViolationError: Cannot access partition_key for a non-partitioned run

@xOnelinx 6 месяцев назад

это настолько поверхностное и лукавое сравнение что я даже не хочу писать комментарий на английском🤦‍♂

@lucasfermo8909 6 месяцев назад

Awesome!!!!

@congtinNguyen 6 месяцев назад

I have struggled with Dagster concepts for some time. Its concepts are numerous and quite overwhelming for beginners.

@dagsterio 6 месяцев назад

No doubt that evey new powerful framework takes some investment up front to learn. Have you explored Dagster University? courses.dagster.io/courses/dagster-essentials

@congtinNguyen 5 месяцев назад

@@dagsterioThank you so much. It has helped enlighten many things

@EtienneTremblay 7 месяцев назад

In terms of debugging, being able to run dagster in debug mode in vscode, set breakpoints, inspect variables is game changer. Here is how to setup it: github.com/dagster-io/dagster/issues/17859#issuecomment-1805916514

@Amapramaadhy 7 месяцев назад

Awesome 👏🏽. Really nice and succinct description of an otherwise tricky feature. Hopefully a future video can cover advanced use cases of how to wire up sensors with partition definitions so that we can programmatically launch/backfill etc. Thanks again for the great content.

@dduran6609 7 месяцев назад

We need acces control to the UI in the CLI dagster instalation, airflow have a good layer of acces control for múltiple users

@dagsterio 7 месяцев назад

We explain the logic in a blog post here: arc.net/l/quote/euycfcsm