An introduction to ksqlDB

Подписаться 4,2 тыс.

Просмотров 14 тыс.

50% 1

You've got streams of data that you want to process and store? You've got events from which you'd like to derive state or build aggregates? And you want to do all of this in a scalable and fault-tolerant manner? It's just as well that Kafka and ksqlDB exist!
This talk will be built around a live demonstration of the concepts and capabilities of ksqlDB. We'll see how you can apply transformations to a stream of events from one Kafka topic to another. We'll use ksqlDB connectors to bring in data from other systems and use this to join and enrich streams-and we'll serve the results up directly to an application, without even needing an external data store.
Attendees will learn:
- How to process streams of events
- The semantics of streams and tables, and of push and pull queries
- How to use the ksqlDB API to get state directly from the materialised store
- What makes ksqlDB elastically scalable and fault-tolerant.
ℹ️This talk was presented at the Kraków Apache Kafka® Meetup, kindly hosted by VirtusLab. My thanks for them for the recording also.
---
📌Slides: rmoff.dev/ksqldb-slides
👾Code: rmoff.dev/ksqldb-demo
📚Free Kafka eBooks: rmoff.dev/3n6
--
☁️ Confluent Cloud ☁️
Confluent Cloud is a managed Apache Kafka and Confluent Platform service. It scales to zero and lets you get started with Apache Kafka at the click of a mouse. You can signup at www.confluent.io/confluent-cl... and use code RMOFF200 for $200 towards your bill (small print: www.confluent.io/confluent-cl...)
--
Other links that I mentioned during the talk:
☁️Confluent Cloud: rmoff.dev/1yj
📌From Zero to Hero with Kafka Connect rmoff.dev/berlin19-kafka-connect
📌The Changing Face of ETL: Event-Driven Architectures for Data Engineers talks.rmoff.net/Jn6rgo/the-ch...
📌No More Silos: Integrating Databases and Apache Kafka rmoff.dev/ksny19-no-more-silos
For community support and discussion check out:
✉️Mailing list: groups.google.com/forum/#!for...
🗣️Slack group: cnfl.io/slack
---

Наука

Опубликовано:

13 май 2020

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 17

@vinitsunita 2 года назад

Awesome explaination

@rmoff 2 года назад

Glad you liked it :)

@RaduToev 3 года назад

Hey, thank you for the presentation. Out of curiosity, does Kafka + ksqlDB kind of replicate what EventStoreDB is doing? If I understand correctly one could use ksqlDB as a materialized view also, but I'm not sure how the two technologies compare. Do you see them overlapping, or do you think that they complement each other? Thanks again

@meditating010 2 года назад

Usually ksqldb could be used as a datastore that can be 'fit-for-use' in some way, event driven data via kafka is just a timeseries datastore which can be used to restore the state.

@doganaysahin9770 3 года назад

Hi Robin, It was a great presentation. Thank you. I have a question about table grouping with window. I group the values for who send requested which at least 3 times in 5 minutes, then insert 3 value. Then consumed in kstream from the topic which is tables send. I expect just one value when i insert 3 person which is the same but my consumer got 3 events. Is it a bug? Or can i consume one event after i insert 3 same person in 5 minutes.

@rmoff 3 года назад

Pretty sure you're talking about the concept of punctuation, which is supported in Kafka Streams but not yet ksqlDB. Head to cnfl.io/slack and #ksqldb channel to ask this question and confirm your understanding though.

@t529615 3 года назад

Great presentation 👍you Menton joining streams with streams. Is it posible to "merge" two or more streams together based on å common key or field? Lets say i want to all events from a given person from several streams and still keep the "order" and timestamps. In short the example greag young uses for his get event store and temporal queries? I want to se all messages from bill inn all chatrooms and handle those events as one stream?

@rmoff 3 года назад

You can merge streams either as a UNION (in effect, using INSERT INTO), or you can do a stream-stream join on a common key. In terms of strict ordering and timestamps head over to cnfl.io/slack and the #ksqldb channel to ask there :)

@mehrdadmasoumi2443 3 года назад

hi Robin Moffat, Thank you for your answers in the stackoverflow. I use MySQL and debezuim to create Kafka topic. When I create a stream in Ksqldb, the data is returned as null, pleaze help me

@rmoff 3 года назад

Hi, I think we were chatting on StackOverflow too :) Best is to continue the conversation there, or head to forum.confluent.io/ to discuss further.

@mehrdadmasoumi2443 3 года назад

@@rmoff I solved the issue. Using debezuim extractor for remove after before keys

@vinitsunita 2 года назад

We are exploring ksqldb for our usecase and we are planning to create stream on the fly. These streams will be wrapper over simple select query with where caluse. The number of streams could go to around 10k. Is it advisable to create 10k streams in KSQL DB and Can it cause performance issues ?

@rmoff 2 года назад

That's quite a broad question. ksqlDB is scalable system by its distributed nature. For more specifics I'd suggest you post at forum.confluent.io/ with more detail and people should be able to help you out there.

@vinitsunita 2 года назад

@@rmoff Thanks will check over there

@priyankan53 3 года назад

I have posted it in a confluent forum, but we dint get any response in it. So we posted it here am trying to consume a row from a stream using select query, where as the row has certain properties along with the inserted time as a column. Now, I need to consume the row after 30 mins gap from its inserted time. I tried using both tables and streams but I noticed that, the select query is applying the conditions at the time of insertion itself and as the condition does not matches at the insertion time, it is not getting consumed. But, my expectation is that, that inserted row has to be consumed after 30 minutes from the isertion time. Could any one suggest me, if it is possible to format a query for this scenario? The select query for the stream/table that I am trying to use for the same: insert into my_stream values (1, ‘some data’, ‘some message’, 1625745622700); //1625745622700 - timestamp in milliseconds at the time of insertion. select * from my_stream where rowtime > inserted_time_ms + 200000 emit changes;