Тёмный

The Realities Of Airflow - The Mistakes New Data Engineers Make Using Apache Airflow 

Seattle Data Guy
Подписаться 96 тыс.
Просмотров 15 тыс.
50% 1

Airflow remains a popular choice when it comes to open-source orchestration tools.
When I surveyed people about a year ago now, it was the most popular open-source solution, and still to this day, my video on “Should You Use Airflow” drives a lot of prospect conversations.
Now, I do want to say that there are plenty of organizations using Azure Data Factory and Informatica, and there are plenty of competitors knocking on Airflow's door.
But for now, Airflow is like the PHP of the data world; people can talk poorly about it, but it continues to be heavily relied upon.
Now, as I said, Airflow is often why I get brought into many projects, meaning I have seen many different ways that teams decide to deploy Airflow.
Some scaled, others didn’t.
Thus, I wanted to take a moment and discuss some ways I have seen Airflow deployed in the past and the challenges people faced as they deployed their code.
If your team is looking to deploy Airflow, or needs help setting up Airflow, then set up a consultation here - calendly.com/ben-rogojan/cons...
Also, if you'd like to learn about an alternative to Airflow, you can check out Mage.ai(bit.ly/41h6Pjy)
This video isn't sponsored by them, but I am an advisor for mage.ai
0:00 - Intro
1:44 - Mistake #1 Putting The DAG Folder In The Same Repo As The Webserver
4:58 - Mistake #2 Not Using All The Features Airflow Offers
8:43 - Mistake #3 Not Thinking About Scale
Looking for an alternative to Airflow, check out this article!
dataengineeringcentral.substa...
If you enjoyed this video, check out some of my other top videos.
Top Courses To Become A Data Engineer In 2022
• Top Courses To Become ...
What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
• What Is The Modern Dat...
If you would like to learn more about data engineering, then check out Googles GCP certificate
bit.ly/3NQVn7V
If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
seattledataguy.substack.com/​​
Or check out my blog
www.theseattledataguy.com/
And if you want to support the channel, then you can become a paid member of my newsletter
seattledataguy.substack.com/s...
Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
_____________________________________________________________
Subscribe: / @seattledataguy
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

Развлечения

Опубликовано:

 

29 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 13   
@SeattleDataGuy
@SeattleDataGuy 8 месяцев назад
If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
@make725daily1
@make725daily1 9 месяцев назад
This video sets an exceptional benchmark! -- "Value the journey, for it shapes your path towards unprecedented accomplishments."
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
glad you found it helpful!
@MarcusJFloyd
@MarcusJFloyd 6 месяцев назад
Thanks, awesome video. I scaled by deploying Airflow on kubernetes and using the kubernetes executor so that jobs continue to run during deploys. We do have the problem of the dags being in the same repo as our image so running the kubernetes executor was a happy medium. I plan to move the dags to another repo and use a git-sync sidecar container to pull in dag updates at a scheduled interval
@rnzqt
@rnzqt 9 месяцев назад
Hey Ben, Watched loads of your videos, and in one of your older ones (or a comment afterwards) you mentioned udacity being s good resource for aspiring professionals, but wishing they had a data engineer nanodegree. Now that they do have (a couple of platform specific) de nanodegrees, is it something you have looked at? Potentially my company is willing to fund a course for me, hoping to move into the database from help desk support. Wondering if that would be a good course to get into. Thanks Olie
@paul_devos
@paul_devos 9 месяцев назад
I so much trauma from trying to Deploy Airflow 3 separate times at 3 different orgs prior to the "Managed Airflow" era (AWS, Astronomer) that I can't even watch this video. Ultimately, I prefer to work in organizations that are generally smaller, more intimate and greater ownership of their own orchestration locally save for when they have data sets that might be agreed upon to be mission critical at the organizational level and ergo that data set moves to the "hub" where a data mesh like governance system may also take on those data sets in a "hub and spoke" like vibe.
@neuronqro
@neuronqro 9 месяцев назад
...how about looking at more "modern" alternatives to Airflow? Dagster, Prefect etc. What do you think about their deployment?
@jerbear97
@jerbear97 9 месяцев назад
literally been trying to deploy Airflow in the past 3 days
@channuangadi7504
@channuangadi7504 9 месяцев назад
I am literally trying to install airflow from 5 days
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
Have you got it deployed yet! hahah it really can be challenging
@looklook6075
@looklook6075 4 месяца назад
@@SeattleDataGuy lol same here. Why so many companies use it. Airflow's design is horrible. looks like it is designed by a bunch a engineers who do not know anything about UI. Something better must come soon.
@romank7944
@romank7944 21 день назад
Did you manage to install it, or maybe finally you used another cron-orchestrator?
@yigidovic
@yigidovic 2 дня назад
@@looklook6075 Could not agree more!
Далее
Угадай МОБА 1 🥵 | WICSUR #shorts
01:00
Просмотров 2,7 млн
Don't Use Apache Airflow
16:21
Просмотров 90 тыс.
Running Airflow 2.0 with Docker in 5 mins
11:55
Просмотров 164 тыс.
I only teach well, why am I a victim
0:42
Просмотров 9 млн
Всегда проверяйте зеркала
0:19
ускорил очередь на кассе
1:00
Просмотров 2,5 млн