Building Robust Data Pipelines for Modern Data Engineering | End to End Data Engineering Project

Подписаться 17 тыс.

Просмотров 26 тыс.

50% 1

In this video, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider. This project illustrate the process of data ingestion to the lakehouse, data integration with ADF and data transformation with Databricks, and DBT.
Timestamp:
0:00 Introduction
0:49 System Architecture
3:01 Creating resource groups on Azure
5:02 Setting up the medallion architecture storage account
8:46 Setting up Azure Data Factory
10:18 Azure Key Vault setup for secrets
14:19 Azure database with automatic data population
25:32 Azure Data Factory pipeline orchestration
47:00 Setting up Databricks
49:50 Azure Databricks Secret Scope and Key Vault
54:33 Verifying Databricks - Key Vault - Secret Scope Integration
1:06:00 Azure Data Factory - Databricks Integration
1:21:19 DBT Setup
1:24:15 DBT Configuration with Azure Databricks
1:32:12 DBT Snapshots with Azure Databricks and ADLS Gen2
1:45:06 DBT Data Marts with Azure Databricks and ADLS Gen2
1:55:00 DBT Documentation
1:58:58 Outro
Resources:
Medium Article: / robust-data-pipelines-...
Full Code: github.com/airscholar/modern-...
If you find our content valuable, support us by joining our channel membership, where you'll get exclusive access to behind-the-scenes content, Q&A sessions, and much more!
/ @codewithyu
💬 Join the Conversation:
We love hearing from you! Share your thoughts, questions, or experiences related to data engineering or this project in the comments below. Don't forget to like, subscribe, and hit the bell icon to stay updated with our latest content.
Tags:
Big Data, Data Engineering, Apache Spark, Databricks, DBT, Azure, Cloud Computing, Data Analytics, ETL, Data Warehouse, Technology, Analytics, Machine Learning, Data Science
Hashtags:
#BigData, #DataEngineering, #ApacheSpark, #Databricks, #DBT, #Azure, #CloudComputing, #DataAnalytics, #ETL, #DataWarehouse, #TechTalk, #MachineLearning, #DataScience, #BigDataAnalytics
🙏 Thank You for Watching!
Remember to subscribe and hit the bell icon for notifications. Stay curious and keep exploring the fascinating world of data engineering!

Наука

Опубликовано:

6 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 57

@CodeWithYu 6 месяцев назад

Spark your curiosity and 'data-fy' your feed - hit LIKE, SUBSCRIBE, and ring the bell. Join our byte-sized revolution in data engineering!💡🚀

@vemedia5850 6 месяцев назад

So happy i found this!! It's brilliant! You are a fantastic teacher.

@CodeWithYu 6 месяцев назад

Glad it was helpful! Don't forget to like and subscribe!

@abhijeetprakash5262 6 месяцев назад

@@CodeWithYu, and also to press the bell button !

@dotproduct763 6 месяцев назад

Awesome content, and very instructive and educational. Thanks a lot, sir.

@CodeWithYu 6 месяцев назад

You're welcome!

@rasmusandreasson1548 6 месяцев назад

You are the best! Keep up the good work!

@CodeWithYu 6 месяцев назад

Thank you, will do!

@nadiiar75 6 месяцев назад

🤗 thank you for your hard work, we appreciate it 🙏

@CodeWithYu 6 месяцев назад

You're welcome! 😀

Месяц назад

Awesome work!

@wiss1998 6 месяцев назад

Thank you for your hard work,you are the best

@CodeWithYu 6 месяцев назад

My pleasure! :D

@prajaktaingole3304 19 дней назад

hey ,excellent content...keep post

@lucaslira5 6 месяцев назад

Thank you so much!

@CodeWithYu 6 месяцев назад

You're welcome! :D

@soundbeans 5 месяцев назад

Great videos man. Do you have any end to end projects involving snowflake? I see snowflake a lot in job specifications, would like to get up to speed on this.

@workhardforyourfamily4826 6 месяцев назад

I love your content bro

@CodeWithYu 6 месяцев назад

Thank you! :D

@RafaVeraDataEng 3 месяца назад

Hi! would it make sense implementing here Azure Terraform as a databricks option to deploy dbt?

@deede20 5 месяцев назад

Awesome content, thank you so much! I never worked on dbt before, just curious what is advantage of using dbt along with databrics when databricks itself is a compute engine?

@hiryu4091 3 месяца назад

That is something I'm really curious about as well!

@ericlaw7588 6 месяцев назад

THANK YU !!

@CodeWithYu 6 месяцев назад

You're welcome! :D

@ragegodoverpowered8669 6 месяцев назад

Hey just out of curiosity How much did the Adls cost? And overall out of 200$

@thsstphok7937 4 месяца назад

Hey CodewithYu, IF you were starting from scratch and aiming to secure a Data Engineering job as quickly as possible, what would you do? a) Pursue a data analyst position. b) Pursue a software engineering role. c) Explore alternative routes. d) Consider freelancing, etc. e) Any other plan you have By the way loving your videos!

@petersandovalmoreno5213 5 месяцев назад

How should a client consume this data?

@SanjaySingh-bm4it 3 месяца назад

Hi, I am getting SQlfailed toconnect error even after tickboxing allow azure services . Can you please help

@muneebafzal4694 4 месяца назад

Thanks brother. I am new to Azure data engineer and I have a question. Do data engineers need to write those sql and yml files?

@CodeWithYu 4 месяца назад

Not 100%, it majorly depends on the kind of project you’re working on. I’ll say as a professional DE, it’s safe to say yes but with some exceptions 😀

@alerthz7737 4 месяца назад

i'm struggling with " dbt snapshot " mine show like this "Found 2 models, 4 tests, 9 sources, 0 exposures, 0 metrics, 535 macros, 0 groups, 0 semantic models" how to fix it to show 7 snapshots like video

@TheMapleSight 5 дней назад

What is the estimated cost if I would like to do this project on my own without Azure Free Trial?

@jaswanth333 6 месяцев назад

Hey Yusuf, I am getting this error while ADF is running the Notebook . AnalysisException: [UNABLE_TO_INFER_SCHEMA] Unable to infer schema for Parquet. It must be specified manually. Ps: I tested manually by loading one of the parquet table file in bronze and its working but unable to do dynamically.

@CodeWithYu 6 месяцев назад

That's strange! I'm not quite sure what could've cause this error but I will suggest you try and retrace your steps from the point of creating Notebook on ADF going forward, that should probably fix this. Please let me know if this works for you.

@jaswanth333 6 месяцев назад

@@CodeWithYu Sure will do that, I was able to successfully recreate the project except for the ADF Databricks table’s dynamic generation part.

@rohitkumar-nk6sd 6 месяцев назад

@@jaswanth333 can you tell me have you resolved this issue if yes please state how you resolved

@odezuligboemmanuel6249 4 месяца назад

@@rohitkumar-nk6sd I used this and it worked. Still the same code but this is cleaner df = spark.read.parquet(f'/mnt/bronze/{file_Name}/{table_schema}.{table_name}.parquet') # Create the database if it doesn't exist spark.sql(f"CREATE DATABASE IF NOT EXISTS {table_schema}") # Save the DataFrame as a table df.write.mode('overwrite').saveAsTable(f"{table_schema}.{table_name}")

@7arantino734 3 месяца назад

@@rohitkumar-nk6sd Hi, in my case it was the date format, in fileName in blob storage was yyyy-MM-dd, just changed it in the Databricks component and it worked