Тёмный

Building Open Data Lakes: Debezium, Apache Kafka, Hudi, Spark, and Hive on AWS 

Gary Stafford
Подписаться 1,7 тыс.
Просмотров 6 тыс.
50% 1

In this video demonstration, we will build a simple open data lake on AWS using a combination of open-source software, including Debezium for change data capture (CDC), Apache Kafka, Kafka Connect, Apache Hive, Apache Spark, and Apache Hudi and Hudi's DeltaStreamer.
All open-source files on GitHub: github.com/gar....
This video represents my own viewpoints and not of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.
📣 Please subscribe to my RU-vid channel for future videos.

Опубликовано:

 

6 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 10   
@abhisheklaller1250
@abhisheklaller1250 2 года назад
It's really amazing to see how things are evolving in terms of ACID properties in big data. Nice Demo!! :)
@investorfriends
@investorfriends 2 года назад
Could you please make a demo for AWS MySQL > Debezium > Google BigQuery ?
@huilingxie4829
@huilingxie4829 Год назад
Do I need to install Kafka or something to run these?
@user-qw5on7pt1v
@user-qw5on7pt1v 2 года назад
Does DeltaStreamer support sync all RDS tables under ONE database into different Hudi tables on AWS S3?
@GaryStafford
@GaryStafford 2 года назад
Checkout HoodieMultiTableDeltaStreamer and/or the Kafka Connect Sink for Hudi
@write2sandhu
@write2sandhu 2 года назад
@@GaryStafford Great video, I was wondering if it would be possible to use delta streamer to support complex transformations, joins etc. I might be wrong but it does appear that we cannot use deltastreamer in spark streaming job, and its limited to a CLI tool.
@GaryStafford
@GaryStafford 2 года назад
@@write2sandhu hudi.apache.org/docs/hoodie_deltastreamer/: Support for plugging in transformations. See "--transformer-class"
@sanjayasahoo2598
@sanjayasahoo2598 9 месяцев назад
Complex pipeline
@arslaneqbal3864
@arslaneqbal3864 2 года назад
Hello , i just want to understand this value hoodie.deltastreamer.schemaprovider.source.schema.file=s3:///hudi/moma.public.artists-value.avsc What should be their in this file? If schema then can you please tell how we can create that file manually just by looking our tables.
@GaryStafford
@GaryStafford 2 года назад
hudi.apache.org/docs/hoodie_deltastreamer/#file-based-schema-provider
Далее
Building a Data Lake on AWS with Apache Airflow
40:14
Аруси Точики ❤️❤️❤️
00:13
Просмотров 319 тыс.
لدي بط عالق في أذني😰🐤👂
00:17
Elon Musk fires employees in twitter meeting DUB
1:58
I've been using Redis wrong this whole time...
20:53
Просмотров 356 тыс.