Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline!

Подписаться 4,2 тыс.

Просмотров 7 тыс.

50% 1

Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again! Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform, providing low-latency pub-sub messaging coupled with native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, which is part of Apache Kafka. ksqlDB is the event streaming database for Apache Kafka, and makes it possible to build stream processing applications at scale, written using a familiar SQL interface.
In this talk, we’ll explain the architectural reasoning for Apache Kafka and the benefits of real-time integration, and we’ll build a streaming data pipeline using nothing but our bare hands, Kafka Connect, and ksqlDB.
Gasp as we filter events in real-time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!
👾 Try it out for yourself: github.com/confluentinc/demo-...
--
🎓 Resources
👾 Try it out for yourself: github.com/confluentinc/demo-...
📔 Slides: talks.rmoff.net/tkl4a1/apache-...
🎥 Introduction to ksqlDB: rmoff.dev/ksqldb-introduction
🎥 Streaming data from Kafka to Elasticsearch: rmoff.dev/kafka-elasticsearch...
--
☁️ Confluent Cloud ☁️
Confluent Cloud is a managed Apache Kafka and Confluent Platform service. It scales to zero and lets you get started with Apache Kafka at the click of a mouse. You can signup at confluent.cloud/signup?... and use code 60DEVADV for $60 towards your bill (small print: www.confluent.io/confluent-cl...)

Наука

Опубликовано:

7 сен 2020

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 24

@maxtudor4239 3 года назад

Thank you so much. I have seen a lot of videos & books already and this is the first time I understand and see all the strength and ease of kafka. Great work !!!

@rmoff 3 года назад

Thanks, glad it helped!

@abdulelahaljeffery6234 3 года назад

This is really an awesome walkthrough .. Thank you!

@rmoff 3 года назад

Glad it was helpful!

@farchanjo 3 года назад

Thanks for all videos!!

@rmoff 3 года назад

Glad you like them :)

@maddi1154 3 года назад

Wonderful session, thanks friend.

@rmoff 3 года назад

Thanks :D

@funsoiyaju614 3 года назад

Thanks for sharing this.

@eddydouyere 3 года назад

Thanks a lot. It's crystal clear

@rmoff 3 года назад

Glad it helped :)

@deabook 3 года назад

Thank you so much.

@rmoff 3 года назад

You're welcome!

@ashchedrin 3 года назад

Great talk and walk-through. I am very new to Confluent platform and ksqlDB seems to be a great thing. I have one question about Confluent connect (following your example of looking up users details from MySQL store): how big that remote MySQL table can be and/or whether or not it is important at all if the join happens on the PK? My sense that it does not matter that much, am I correct?

@rmoff 3 года назад

You can join on other fields but would need to re-key the data in the Kafka topic first for the join to succeed - you can do this easily enough with ksqlDB though. In terms of size, in general ksqlDB can scale horizontally but for specifics I'd recommend testing it yourself. Also head to #ksqldb channel on cnfl.io/slack to discuss further.

@mbesida 3 года назад

@@rmoff following question regarding joining: when you create a table in ksqlDB(which is another topic from Kafka perspective) does ksqlDB handles indexes somehow? Is it a full fledged dB engine behind the scenes. And how it does fetch data from the real topic when stream with join is flowing and for each rating it needs to enrich data with user's name fetched from somewhere(my guess it doesn't do random acces on real Kafka topic). Or table creates completely new structure which is handled only by ksqlDB?

@shibilpm9873 11 месяцев назад

Why every one using conflunet kafka thsi and that, I wanted to do it in production and confluent kafka is not open source. Can anyone suggest any article or video to refer, I want to load csv or json file to kafka as a table.

@rbb1485 2 года назад

what happens to the streams or tables if in case the ksqkdb or kafka connect cluster crashed ? if I restart the docker where im running the ksqldb streams or kafka connect will the streams starts from the where they left off ? are these any instance where you had too many streams and half of the crashes, how do you recover ?

@rmoff 2 года назад

Please post this over at forum.confluent.io/ and I will try to answer it there

@janga8717 3 года назад

I am using JDBC Connectors and receive `Key format: ¯\_(ツ)_/¯ - no data processed` although I have set `"key.converter": "org.apache.kafka.connect.storage.StringConverter"` in my connector. I dow see the full stream with key value null: `rowtime: 2021/05/31 08:33:38.411 Z, key: , value: {"id": 10, ...`. Do you have an idea what could went wrong / what typical issues may come up in that point?

@rmoff 3 года назад

Hi, the best place to ask this is forum.confluent.io/ :)

@janga8717 3 года назад

I do have some questions: Kafka is using a key value storage for messages and have some great features for data persistency. But usual data streams should not be stored forever - right? I guess Kafka has an internal cleanup policy for removing old streams, specially if topics come to maximum of physical sizes. How does ksqlDB handles that kind of cleanup policy (if it's exists?) since we are using it for database purposes it should be available for lifetime. So my question: What is ksqlDB? Is it a kafka topic consumed always from offset earliest, is it comparable to a Redis key value storage, is it comparable to a MongoDB document storage, is it comparable to sql databases, ... ?

@rmoff 3 года назад

Great question. Kafka stores data based on the retention policy which can be configured per topic. You can retain data based on size, duration - or indeed forever (which is totally valid, see www.confluent.io/blog/publishing-apache-kafka-new-york-times/ and www.confluent.io/blog/okay-store-data-apache-kafka/). It's basically down to the use case for the data. You say "data streams should not be stored forever", but it depends on what that data is - there are plenty of examples where you *would* keep that data forever. There are also compacted topics, in which the latest value of a key is retained forever, whilst earlier values of the key are removed. In terms of ksqlDB itself, it is built on Kafka topics, so the same principles above apply. If you have more questions head over to forum.confluent.io/

@janga8717 3 года назад

@@rmoff Thanks a lot for your support