Тёмный

Scaling Out Airflow 

Astronomer
Подписаться 5 тыс.
Просмотров 3,8 тыс.
50% 1

Airflow is purpose-built for high-scale workloads and high availability on a distributed platform. Since the advent of Airflow 2.0, there are even more tools and features to ensure that Airflow can be scaled to accommodate high-throughput, data-intensive workloads. In this webinar, Alex Kennedy will discuss the process of scaling out Airflow utilizing the Celery and Kubernetes Executor, including the parameters that need to be tuned when adding nodes to Airflow and the thought process behind deciding when it’s a good idea to scale Airflow, horizontally and vertically. Consistent and aggregated logging is key when scaling Airflow, and we will also briefly discuss best practices for logging on a distributed Airflow platform, as well as the pitfalls that many Airflow users experience when designing and building their distributed Airflow platform.
Key Takeaways:
- With the right infrastructure and architecture, Airflow is capable of massive scale! Getting there will require patience and experimentation, but the latest versions of Airflow make this process as painless as possible.
Airflow’s CeleryExecutor and KubernetesExecutor are designed for scalable workloads.
- There are key parameters in your Airflow configuration which will need to be carefully tuned in order to allow Airflow to scale smoothly and provide minimal latency between tasks.
- Scaling with Celery is as easy as adding a node to your cluster, and providing the correct configuration and Airflow files to that node.
- Aggregated and consistent logging is crucial for being able to debug the scaled Airflow platform.

Развлечения

Опубликовано:

 

30 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 11   
@christianfernandez5717
@christianfernandez5717 3 месяца назад
Great video. Would also be interested in a webinar regarding scaling the Airflow database since I'm having some difficulties of my own with that.
@Astronomer
@Astronomer 3 месяца назад
Noted, thanks for the suggestion! If it's helpful, you can check out our guide on the metadata db docs.astronomer.io/learn/airflow-database. Using a managed service like Astro is also one way many companies avoid scaling issues with Airflow.
@risebyliftingothers
@risebyliftingothers Год назад
Awesome 👍
@Astronomer
@Astronomer Год назад
Thanks!
@felipegermany
@felipegermany 2 года назад
What's the name of the logging system shown in the video @32:14 ?
@risebyliftingothers
@risebyliftingothers Год назад
Elastic search with kibana
@Astronomer
@Astronomer Год назад
Thanks!
@pedroandrade9736
@pedroandrade9736 Год назад
Is not recommended to use **worker_autoscale** ??
@Astronomer
@Astronomer Год назад
Nope, definitely use worker_autoscale!
@mohitkeshwani456
@mohitkeshwani456 2 года назад
Please make a airflow tutorial... 😐
@Astronomer
@Astronomer 2 года назад
Hey - you can find many tutorials and guides on our website www.astronomer.io/guides/
Далее
Best Practices For Writing DAGs In Airflow 2
46:24
Просмотров 9 тыс.
Ouch.. 🤕
00:30
Просмотров 5 млн
The moment we stopped understanding AI [AlexNet]
17:38
Просмотров 808 тыс.
Don't Use Apache Airflow
16:21
Просмотров 90 тыс.
Dynamically Generating DAGs in Airflow
55:36
Просмотров 12 тыс.
Deep dive in to the Airflow scheduler
43:06
Просмотров 13 тыс.
Microservices with Databases can be challenging...
20:52
Managing Apache Airflow at Scale
33:35
Просмотров 3,4 тыс.