Hello! As a newbie data engineer, I've found your videos to be incredibly helpful. The way you explain concepts makes it easy for me to grasp and apply them in my work. Thank you for sharing your knowledge and helping me on my learning journey! Looking forward the your next videos
Hi, Is the connector name and topic name always same? Can you name your ropic something else? If you want to have multiple topic for 1 connector then it will be helpful. Thanks in advance.
Hello! Thank you for the amazing content which briefly explain data streaming with CDC, and I just have a quick question regarding the location in container where debezium store all configuration made when setting up a connector. I am asking this for the purpose of knowing how someone can persist a connection for later usage even when the container stop. Thanks
You can save the connector configs in the /kafka/config folder and attach it in the yml file. This way you can persist the connector settings and start the connector via CLI. Connector settings are only removed when you destroy and re-create the debezium container. On the stop/restart the connector settings are persisted.
How to handle pipeline disruptions. Can you provide some insights for the below referred points? 1. There seems to be known limitation with PostgreSQL database that transactions that are already read by CDC replication task can’t be reprocessed even when the task is restarted from old LSN Value. 2. Also it appears, the task cant be moved between the replicate servers without coordinating with the PostgreSQL DBA on updating the pg_hba.conf file. Can we create a script to overcome this or any better alternatives.
Hey Chald244, you can use the below function to read the logical replication data. The peek function behaves just like the pg_logical_slot_get_binary_changes() function, except that changes are returned but not consumed; that is, they will be returned again on future calls. You may also want to take a look at what plugin you are using to create the replication slot as it will determine what sort of values are returned when querying the replication data. The default is `pgouput` you may want to create the logical replication slot with `test_decoding` to get the text data back otherwise, you may need to decode the wal data before you can use it. Here is the documentation for the replication functions. www.postgresql.org/docs/9.4/functions-admin.html pg_logical_slot_peek_binary_changes(slot_name name, upto_lsn pg_lsn, upto_nchanges int, VARIADIC options text[])
Hello! Just found your amazing channel and enjoying it a lot. I have a question about subject. I reproduced your setup and it works just fine for inserts and updates. But I noticed that on delete no message is produced to kafka topic. Any tips on how to fix this? In any case thank you for your content!
Hi. Newbie here. I am encountering this error ModuleNotFoundError: No module named 'kafka.vendor.six.moves' when I tried to run something via jupyter. Any suggestion how to fix this?
Hey there, you can enable the "key.converter.schemas.enable" in the connector. This will include the schema level changes. It will give you additional details and fields called "before" and "after". These will contain the state of a row before and after an event. This way you can updates and deletes in the database.
The connector continuously captures row-level changes that insert, update, and delete database content and that were committed to a PostgreSQL database. You can modify the connector to include few more properties to get the deleted flag in the kye/value pair. Below are the relevant properties. Here is the link to debezium docs: debezium.io/documentation/reference/stable/transformations/event-flattening.html transforms.unwrap.drop.tombstones=false transforms.unwrap.delete.handling.mode=rewrite
Hi. I love your videos. I have been trying this project for months now, but still getting "connecting to my Ip address refused ". please how can I solve this problem? I am stucked here for months now
Try and connect to this postgres database outside of this project and make sure you are able to connect. Here are few steps to remedy the Postgres connection issues. In the Potsgres installed directory locate and open the postgresql.conf. add this line to that file listen_addresses = '*' Then open file named pg_hba.conf. Add this line to that file. host all all 0.0.0.0/0 md5 Now restart your pogresql server and try again.
You can check the AWS stack and see if there any service that supports CDC with Dynamo db. Amazon DynamoDB is a NoSQL database and it may not offer same capabilities as a traditional SQL database. This streaming stack depends on the PostgreSQL built in replication proccess.
This means you re missing the environment variables “PGPASS”. You can remove it and provide the password in the script to resolve it. Another option is to define this environment variables and it should work as expected.