Manage your data pipelines with Dagster | Software defined assets | IO Managers | Updated project

BI Insights Inc

Подписаться 14 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

4 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 17

@BiInsightsInc Год назад

Link to previous video on Dagster: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-t8QADtYdWEI.html&t ETL with Python: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-dfouoh9QdUw.html&t

@MrMal0w Год назад

Love it ! Dasgter is my favorite tool for data orchestration and you video is very well built 🎉 need more on this topic :)

@jeanguerrapty Год назад

Hi @BiInsightsInc, thank you very much for posting this awesome content. Could you please create an ETL video or series that work with these tools and MongoDB?

@BiInsightsInc Год назад

I will try and add the IO Manager for MongoDB.

@Sebastian-xw4mp 5 месяцев назад

@BiInsightsInc, between 03:05 and 05:39 the requirements.txt magically appears in your etl folder. Makes it hard to follow along your video...

@BiInsightsInc 5 месяцев назад

You can clone the repo, this way you will have all the requirements, then follow along. All links are in the description. Here is the link to the repo: github.com/hnawaz007/pythondataanalysis/tree/main/dagster-project/etl

@akmalhafiz7830 11 месяцев назад

Thanks this is helpful, however I do have a question, let say if I want to come out with ELT pipeline and ingest entire database into a data warehouse, is it better for me to separate the table into multiple data assets and ingest one by one? or just use one data asset?

@BiInsightsInc 11 месяцев назад

It’s better to split each table as an asset. Each source table should have an asset, then stage this data after this step it descends on your data modeling strategy on how you want to model this data.

@akmalhafiz7830 11 месяцев назад

@@BiInsightsInc thank you for the input

@MrMal0w Год назад

Question : to implement an incremental load io manager we need to use the ‘append’ arg istead of ‘replace’ to sqlAlchemy. Is it possible to send this paramter directly from the asset ?

@BiInsightsInc Год назад

It is possible. I have seen an example of this on stack overflow but it requires a little more configuration, link below. Another idea would be to have two versions of IO Manager one for incremental (append) and a second one for truncate and load (replace). stackoverflow.com/questions/76173666/how-to-implement-io-manager-that-have-a-parameter-at-asset-level

@MrMal0w Год назад

@@BiInsightsInc thanks a lot, I well check it :)

@henrikvaher697 Год назад

This is grear, I've had similar issues. I want to query an API and APPEND the retrieved data to the existing asset.

@whalesalad Год назад

A popular practice with BigQuery is to process data in stages where each stage is effectively a table. So you might have a raw table that takes all the raw data in, and then a pivot or aggregation process that would take the data from table A and write it to table B. I am trying to wrap my head around how to do this correctly with Dagster. The data would always live inside of BQ, never coming out into these python functions. Is there a best practice for this sort of thing? Effectively there is no IO, it is all remote, and Dagster would just be orchestrating the commands. Is this possible?

@BiInsightsInc Год назад

I think this is a standard elt approach if you are buidling data mart or database using SQL. dbt will be perfect for this use case. Your data lives in your database. You can transform it with sql using dbt. You can have raw sources, build intermediate tables for transformation and final dims and facts for analytics. Dagster can orchestrate the whole process ad-hoc or on a schedule.

@zamanganji1262 Год назад

If we need to execute multiple .sav files and convert them into multiple CSV files and do some modifications on them, how can we accomplish this using Dagster?

@BiInsightsInc Год назад

I saw your comment on the reference data ingestion video. You can borrow the code on how to ingest multiple files from there. You can easily covert the Python functions to "op" or and/or "asset" with the help of Dagster decorators. I have covered how to covert a Python script to "op" in this video here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-t8QADtYdWEI.html&t Code to convert sav files: import pandas as pd df = pd.read_spss("input_file.sav") df.to_csv("output_file.csv", index=False)