Integrating Oracle and Kafka

Подписаться 4,1 тыс.

Просмотров 22 тыс.

50% 1

For all its quirks and licence fees, we all love Oracle for what it does. But sometimes we want to get the data out to use elsewhere. Maybe we want to build analytics on it; perhaps we want to drive applications with it; sometimes we might even want to move it to another non-Oracle database-can you imagine that! 😱
With Apache Kafka as our scalable, distributed event streaming platform, we can ingest data from Oracle as a stream of events. We can use Kafka to transform and enrich the events if we want to, even joining them to data from other sources. We can stream the resulting events to target systems, as well as use them to create event-driven microservices.
This talk will show some basics of Kafka and then dive into ingesting data from Oracle into Kafka, applying stream processing with ksqlDB, and then pushing that data to systems including PostgreSQL as well as back into Oracle itself.
🗣️ As presented at ACEs @ Home meetup on 15th June 2020
📔 Slides and resources: talks.rmoff.net/ixPL5r/integr...
--
ℹ️ Table of contents:
1:34 What is Kafka? (see also talks.rmoff.net/Q3AoWZ/kafka-...)
10:00 What're the reasons for integrating Oracle into Kafka?
14:41 Kafka Connect (see also talks.rmoff.net/DQkDj3/from-z...)
17:50 The two types of Change Data Capture (CDC)
19:40 Live demo - Oracle into Kafka
24:30 Live demo - Difference between CDC methods illustrated
28:40 Live demo - Streaming data from Kafka to another database (Postgres)
32:59 Live demo - ksqlDB
37:19 Live demo - Joining a stream of events to a table in ksqlDB
40:14 Live demo - Building aggregates in ksqlDB
41:24 Live demo - Creating a sink connector from ksqlDB to Postgres
44:04 Live demo - ksqlDB stream/table duality, push and pull queries
46:29 Live demo - Key/Value lookup against state in ksqlDB using REST API
47:44 CDC recap, how to choose which to use
49:29 ksqlDB overview
52:50 Summary & useful links
--
☁️ Confluent Cloud ☁️
Confluent Cloud is a managed Apache Kafka and Confluent Platform service. It scales to zero and lets you get started with Apache Kafka at the click of a mouse. You can signup at confluent.cloud/signup?... and use code 60DEVADV for $60 towards your bill (small print: www.confluent.io/confluent-cl...)

Наука

Опубликовано:

5 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 32

@rmoff 3 года назад

🚨 Since this video was published Confluent have released their Oracle CDC connector - see this blog for more details: www.confluent.io/blog/introducing-confluent-oracle-cdc-connector/?.devx_ch.rmoff_LAoepZTapMM&

@user-ld8op1lb1p Месяц назад

So Amazing

@ricardolourival5391 2 года назад

Parabéns pela apresentação.

@mousakanou9895 3 года назад

Great presentation 👍👍

@rmoff 3 года назад

Thanks, glad you liked it!

@vishwasma 4 года назад

That was a great demo. Thank you for doing this.

@rmoff 3 года назад

Glad you liked it!

@vigneshwarreddy3368 3 года назад

Hi! Thanks for the good explanation. When I'm using JDBC Source connector for oracle the columns of the table are coming within the double quotes due to which I was facing insert error at Postgres as column mismatch (already tables are created at Postgres). Can we avoid wrapping up the column names within double quotes. do we need set any other configuration parameter to avoid this?? Thanks for your help in advance.

@rmoff 3 года назад

A good place to ask this is at forum.confluent.io/

@twelvetonineshift9 3 года назад

Does this work with Oracle cloud as well. We are using Informatica CDC today but our source Oracle system is now moving to cloud and Informatica CDC does not seem to work with Oracle cloud. How about Confluent Kafka?

@bhomiktakhar8226 3 года назад

Hi Rmoff, when you say that query based CDC will only pull the records in first poll and last poll and not in between ones, does that mean that if there is some oracle table with high throughput (crores of records per day)... Then if we use JDBC source connector (query based) ... That might not pull all records to kafka?.. Because I am facing this problem, in which records are getting missed in between the day, with no other major configuration difference in the source connector. Also when the connector polls, (suppose timestamp based).... So it does like select * from table where timestampcol=last_poll_time. So how will this loose records while polling? Anyways, it's a great video.

@rmoff 3 года назад

Yes, exactly that, which is why log-based CDC is better in many situations. I cover this also here: rmoff.dev/no-more-silos

@mousakanou9895 3 года назад

Do you have all databases installed in a Docker?

@rmoff 3 года назад

Yes - github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md#building-oracle-database-docker-install-images

@ppsps5728 3 года назад

Hi rmoff, Since savepoint creates an oracle create scn it get translated as csn in ogg. Do we have a way to filter out savepoint events from kafka handler? Again Thanks a lot for this great demo..

@rmoff 3 года назад

Hi, I've not worked with the Kafka handler in OGG much so I don't know the answer, sorry. If it's a message on a topic you could always filter it out post-ingest with Kafka Streams or ksqlDB.

@ppsps5728 3 года назад

@@rmoff Thanks a lot .. Even OGG Kafka Connect adapter has this as it’s an abstraction on top of ogg Java addons.. It’s really tough to differentiate a save point transaction from a committed transaction in Kafka topic , as both of them look similar from an op_type .. I’ll try to raising an SR with Oracle 😊( Saw that Debezium already fixed this issue for MySQL)..

@agustincalderon5473 2 года назад

Do you have any sample without using containers ?

@rmoff 2 года назад

Most of what I do is in Docker as it's just easier for creating and sharing demos. If you've got a particular question about an element of it that you need help with outside of a container then feel free to head over to forum.confluent.io/ and ask there :)

@arada123 3 года назад

Hi, thanks alot for your effort to work on this. I tried to do the sam but stuck on oracle docker image. I could get the docker image from docker hub after login and start the rest from your docker-compose.ylm file except the docker one this seems not to work. I tried to build the docker image for oracle as you pointed out but I wanted to do it in aws ec2 and I stuck there as you can not wget the installation file as this requires authentication. Can you point me out to some solution hier. How did you creat your docker image?

@rmoff 3 года назад

Hi, I built my Docker image for the Oracle database per instructions here github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md#building-oracle-database-docker-install-images

@arada123 3 года назад

@@rmoff Hi, thanks for you response. Now I can connect oracle to kafka, but faces 2 main problems. 1) the oracle db table has more than a million entries and when I do bulk I am getting only 800k entries in kafka, when I do timestamp getting only ca 200k. 2) the increment id in oracle db table is string like "AB1234" and can not be used. CAn this some be casted to a integer? Do you have any suggestion for those cases? Thanks alot for the videos and documentation you are providing. It helped me so much to get started in this topic. keep it up. Your presentation was so clear and so helpful.

@rmoff 3 года назад

@@arada123 Hi, the best place to ask this is on: → Slack group: cnfl.io/slack or → Mailing list: groups.google.com/forum/#!forum/confluent-platform

@arada123 3 года назад

Hi rmoff, Tgx for yout input. I manage to set up a oracle kafka connector. But getting the following error as I tried to import big table: "The message is 1320916 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration." I am struggling to set this "max.request.size" the whole day bzt never managed. Where can I set this value. I am not using docker and have confluent-5.5.1 Thx in advance.

@rmoff 3 года назад

Hi, the best place to ask this is on: → Slack group: cnfl.io/slack or → Mailing list: groups.google.com/forum/#!forum/confluent-platform

@alfairawan6018 3 года назад

Hi @rmoff, I have question I use Avro to sink with my Oracle DB but I don't know how to put Database Schema. I tried to put in table.name.format Failed. any suggestions to solve this?Btw Greet demo 👍 Thx b4

@rmoff 3 года назад

Hi, the best place to ask this is on: → Slack group: cnfl.io/slack or → Mailing list: groups.google.com/forum/#!forum/confluent-platform

@alfairawan6018 3 года назад

@@rmoff Thx u for ur fast replay and link group.

@Algoritmik 3 года назад

10:40 Kafka does NOT use the push model. Actually, the S3 pulls the data from Kafka, as every Kafka consumer does.

@rmoff 3 года назад

I think we're both right ;-) The Kafka Connect worker uses the Consumer API under the covers to consume (pull) data from Kafka, and then pushes the data to S3.