dbt and Python-Better Together

Подписаться 110 тыс.

Просмотров 12 тыс.

50% 1

Drew Banin is the co-founder of dbt Labs and one of the maintainers of dbt Core, the open source standard in data modeling and transformation. In this talk, he will demonstrate an approach to unifying SQL and Python workloads under a single dbt execution graph, illustrating the powerful, flexible nature of dbt running on Databricks.
Connect with us:
Website: databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / data. .
Instagram: / databricksinc

Наука

Опубликовано:

18 июл 2022

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 7

@guilhermegonzalez5501 7 месяцев назад

Amazing presentation!

@kebincui 9 месяцев назад

Awesome presentation 👍

@vai5tac336 Год назад

amazing!

@paulfunigga Год назад

I still don't understand what dbt is.

@MrTeslaX-vi2qn 10 месяцев назад

same here.. not sure what it is and what's its purpose?

@samjebaraj6895 10 месяцев назад

@@MrTeslaX-vi2qn Looks like it's completely SQL based ETL framework for building data marts/warehouse.

@ravishmahajan9314 5 месяцев назад

So basically it the T in ETL. You can transform data using SQL in dbt. But your question may be , SO WHAT? I can transform using SQL server as well. 😂 So the answer is dbt is your transformation friend. It has no database of Its own. It uses Its host database, for Eg. If you are transforming BigQuery warehouse, then the transformed data is in that warehouse only. But what dbt do is, it STRUCTURES THE TRANSFORMATION PROCESS. It means you can version control like git and create dependency models between different views & tables, merge them are create a kind of data pipeline. You will be able to visualize how a table has been transformed and what steps have been taken to get it transformed. Also it can be used to setup automatic pipeline. So you don't need to worry if more data gets into warehouse, it will incrementally refresh and apply same transformation steps. It automatically creates an acyclic graph . So think of how you use multiple CTEs (Common table expressions) in SQL to solve a very complex query. Each CTEs transforms, group by, from multiple tables and then joined to get final output. But the problem is of you are writing a query with 50 CTEs, you will eventually get confused. But DBT helps you with automatically creating a graph that shows how all CTEs works together to get output. Also from one click, you can create a documentation. You can create three instances (development > test>production) in dbt to create complex SQL pipelines. Hope it clears. PS : I am also learning. My be I am wrong somewhere in my understanding.