Counting Youtube views/ad impressions exactly once

Vinayak Sangar

Подписаться 7 тыс.

Просмотров 4,5 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

8 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 10

@deepshikhazutshi6997 9 месяцев назад

For exactly once semantics, we can also follow a 2 Phase commit protocol approach. Idea is to maintain a checkpoint at each stage (stages 1,2 & 3). This can be also called as pre-commit stage in the 2PC. The checkpoint can contain the read offset of the kafka topic. Once the pre-commits of all 3 stages are successful then only the final commit would be made. Indicating success. If any of the stage fails, we rollback that particular stage (or rollback all 3 stages - depends upon the scenario at hand). Approach: We can have separate table called as a checkpoint table which will track the status of each stage. This checkpoint record is unique for an offset. This table will have checkpoint id as PK and the Cassandra can contain checkpoint id as FK. If any stage fails, that single stage / all stages for that checkpoint id can be rolled back from Cassandra. This checkpoint id can serve as idempotency key and would help deduplicating records as we can UPSERT records using this id

@mach1ontwowheels632 Год назад

This is not how Flink works. The streaming application is supposed to update the state in Flink using the state primitives which is similar to updating in local memory and periodically flush to Cassandra (which is an expensive operation).

@user-te8mg8rs8c 2 месяца назад

This sharing was so good, thanks! One thing I want to point out that the real fact is that the number infinitely approximates 1 because "retry" is always set to 3 :)

@cschandragiri Год назад

Exact one counting is hard problem. Optimistic locking using CAS or pessimistic locking doesn’t scale well. Also, kafka producers might also fail before getting ack from brokers and could resend messages. We need idempotent producers and consumers with transactions enabled on kafka itself.

@meditationdanny701 Год назад

Yes i agree having the consumers with transactional properties will help solving "count exactly problem" effectively plus the overhead in terms of time that will come to the transactional system can softly ignored as the whole system is asyn in nature so assuming we will eventually have data the data is the correct and regarding the idempotent at producer side can be solved using uuid against each event and storing it into db and before sending that event into kafka producer will check the db if the uuid exists if it exists it can skip that event otherwise send to kafka note the whole operation is happening within a single txn so if any failure happens uuid from the db can be rollback ensures that the message is either sent exactly once or skipped entirely.