Apache Flink - A Must-Have For Your Streams | Systems Design Interview 0 to 1 With Ex-Google SWE

Подписаться 35 тыс.

Просмотров 14 тыс.

50% 1

Again, go to iHop, crazy calories per dollar
To be clear, the reason why these snapshots work is because every snapshot on a given node says that the state on that node is only derived from the messages before the snapshot barrier on that queue. Because Flink can create multiple copies of state, the consumer can keep going when it receives some, but not all of its barriers. Ultimately the checkpointed state will only have the messages up to the barriers, so that all of the consumers can replay messages starting from the barrier after a failure. Hope this makes sense.

Наука

Опубликовано:

1 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 35

@grim.reaper 9 месяцев назад

I rewatch your videos all the time because your explanation is really helpful!! Do you recommend any resources for reading? And trying out these in code?

@jordanhasnolife5163 9 месяцев назад

Honestly just anything that doesn't make sense I'd just look up the official docs. As for coding good question, it's open source stuff so you're always welcome to check out the code

@maxvettel7337 9 месяцев назад

By the way, Flink was in video about Web Crawler It's sad that I can't use streams at work

@dibll 5 месяцев назад

Jordan, from 5:30 time onwards I was not able to understand how flink provides exactly once processing of messages. You used the term checkpoints and snapshots - are they the same thing if not whats the difference and when do we take one over the other? Also when we save state of the consumers - are we saving all the events that are in memory of that consumer and why do we need to replay the queue once we re-store the state from S3, if we replay wouldn't it process all the messages one more time, I think my confusion is with replay word, does it mean to send all the messages which have not been acknowledged by the consumer yet or something else? Could you pls explain in layman terms once more. Thanks in advance!

@jordanhasnolife5163 5 месяцев назад

Hey! Basically I use the terms checkpoint and snapshot interchangeably. The idea is that our checkpoints are occasional and so it only captures a certain amount of state, but some events may have been processed after one checkpoint but before the next. When we restart, we need to replay the events after our checkpoint barrier (even if they had already been handled before the failure) so that we can rebuild the state in our consumers that had not been saved in our checkpoint.

@mocksahil 4 месяца назад

@@jordanhasnolife5163 Does this mean that if we're running a counter at the end we risk extra counts for those items post barrier that may already have been processed once? Does this mean we need to incorporate a UUID or something to double check in the consumer?

@jordanhasnolife5163 4 месяца назад

@@mocksahil If the counter is within flink, then no, these will be processed exactly once! However, if your counter is in an external database, and you're incrementing it, you may hit that multiple times! In which case you've hit the nail on the head, using a UUID as some method of maintaining idempotence can stop us from double counting.

@grim.reaper 9 месяцев назад

Was waiting for this after watching the last video 🥹

@AntonioMac3301 9 месяцев назад

yo you should make some playlists and group the videos based on the topics - i think like all the stuff over streams is pretty fire so it'll be cool to be all in one place, same with batch processing

@jordanhasnolife5163 9 месяцев назад

I may do this, but for now they're all under one systems design playlist and in order so all the batch/stream transferring stuff is near one another

@michaelencasa 3 месяца назад

Hey great content. I’m new to these scalable systems so maybe I’m misunderstanding, when messages need to be replayed after a Flink consumer crashes do those messages come from kafkas log of already processed messages? If yes, can that process of Flink coming back online, reading its snapshot and replaying all messages after the most recent set of barrier nodes be automated? Or will that likely be a manual process? And I’m still waiting on my foot pics. 😢❤

@jordanhasnolife5163 3 месяца назад

Haha yep that's an automatic process

@tarunnurat 3 месяца назад

Hey Jordan, I'm having difficulty understanding the "exactly once" aspect here. Say you have some messages that were processed by a consumer that are processed right after a checkpoint, and the consumer goes down. Now when the consumer comes back up, it would be initialised from the snapshot at the last checkpoint, and it would reprocess the messages thst come right after the checkpoint, right?

@jordanhasnolife5163 3 месяца назад

Yes, but it would be reinitialized with the state that it had at the checkpoint. So reprocessing these messages would lead to identical state as it would have had before. Basically, exactly once processing doesn't mean that each message is only processed exactly once. It just means that the state that is generated in Flink will be as if each message had only been processed once. If for whatever reason Flink is going to do something that isn't idempotent (like sending an email), that can happen multiple times.

@yaswanth7931 Месяц назад

@@jordanhasnolife5163 suppose, assume the above case for the consumer 2 in the example from the video. Then the messages will be duplicated to the consumer 3 right? As the consumer2 will process again those messages and keep them in the queue again for the consumer3. Then how can we guarantee that each message affects the state once? Am i missing something here? Please explain? Is it that if one consumer went down then all the consumers goes back to there previous check point?

@jordanhasnolife5163 Месяц назад

@@yaswanth7931 The state that gets restored to each consumer is what was present when the snapshot barrier reached that consumer. Hence, we can ensure that all messages before the snapshot barrier have been completely processed, and we can restart by processing all messages after the snapshot barrier.

@gtdytdcthutq933 3 дня назад

So even when one node crashed, we restore ALL consumers from the last snapshot?

@jordanhasnolife5163 22 часа назад

@@gtdytdcthutq933 Double check to confirm but I believe if we want the whole "exactly once" processing then yes

@kaqqao Месяц назад

I was really expecting you'd mention Kafka Streams and contrast that with Flink 😶 I'm struggling to understand when I'd need which. Maybe a topic for a future video?

@jordanhasnolife5163 Месяц назад

Perhaps so! As far as I know they're basically the same, but if I can find a stark difference I'd be happy to make a video!

@kaqqao Месяц назад

You're such a legend for replying to comments! Thank you! 🙏

@levyshi Месяц назад

In the video you mentioned that Flink are for the consumers, does that mean a flink node would sit between a broker and the consumer?

@jordanhasnolife5163 Месяц назад

Nope, flink is the consumer

@nithinsastrytellapuri291 2 месяца назад

Hi Jordan, since this is a distributed system and each consumer is passing messages to the other, does Apache Flink use the Chandy-Lamport algorithm to take distributed snapshots?

@jordanhasnolife5163 2 месяца назад

I believe it does!!

@shibhamalik1274 2 месяца назад

Hi @jordan does Spark Cache data or it does a fresh fetch of data on every run of its pipeline ?i know there is a cache api where we can cache but if we dont use that does spark fetch the same data from DB (spark sql) everytime we run the pipeline ? if not how is it so efficient and fast without a cache?

@jordanhasnolife5163 2 месяца назад

Hi! Are you talking about spark or spark streaming?