Count Unique Active Users Design Deep Dive with Google SWE! | Systems Design Interview Question 26

20: Distributed Job Scheduler | Systems Design Interview Questions With Ex-Google SWE

24 ЧАСА СТОЮ НА ГВОЗДЯХ! ЧТО СЛУЧИЛОСЬ С НОГАМИ? #нонале

❌Чего только нет в сумочках у девушек🤦🏻‍♂️ #pov #story

Удивительно, что ЭТО готовится ИЗ КАБАЧКОВ! Лучше оригинала! КЕТЧУП и АДЖИКА из кабачков!

Как можно наломать дров и не поймать рыбы.

Distributed Job Scheduler Design Deep Dive with Google SWE! | Systems Design Interview Question 25

Jordan has no life

Подписаться 40 тыс.

Просмотров 26 тыс.

50% 1

Видео Поделиться Скачать Добавить в

For my next trick I'll show you all how to make a blow job scheduler
00:18 Introduction
01:22 Functional Requirements
02:33 Capacity Estimates
03:21 API Design
04:09 Database Schema
05:18 Architectural Overview

Опубликовано:

8 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 91

@shivanshjagga255 Год назад

4:43 DB Schema: [jobID, s3URL, status, retryTimestamp ] status = ENUM , (enum - STARTED / NOT_STARTED / DONE / CLAIMED) . 7:00 Querying the DB. ACID compliance. Indexing should be done on timestamp - Query: select the tasks that are NOT_STARTED where timestamp< current_time 8:50 failure during job run. - MQ failure - Node failure - (9:44) :New Query: select the tasks that are NOT_STARTED where timestamp< current_time - AND - tasks that are STARTED where timestamp + enqueing time + heartbeat < current time. 10:46 : Messaging Queue choice 12:14 : Claim service /DB + zookeeper Zookeeper is to check if the node is down or not. Then we can write in the Metadata DB that it's a retryable error 14:54 : Node dies and comes back up and tries the job again = 2 nodes trying the job Distributed lock. Ending note: how toschedule jobs at a fixed rate (WEEKLY / MONTHLY ) The task-runner service itself will write to the DB, the next time the task should be run again Ex: for BI-WEEKLY schedule, it will add the next time it has to be run.

@shivujagga Год назад

18:25 The whole flow: 1. Client uploads job -> goes to S3 and gets stored in DB with its schedule. 2. The enqueue service (1 machine) polls DB every minute for all jobs with the query mentioned at 9:44 3. Batches and sends jobs to MQ. 4. MQ sends it to multiple workers. and sends heartbeats to zookeepers. (Zookeeper was used for distributed locking of jobs being run) 5. Worker updates the STATUS if the job was completed or not. I have 1 question that's not addressed though @Jordan has no life. What if the worker completes the job but fails at the point before updating the STATUS of the job as COMPLETED in the DB?

@shrutimistry2086 Год назад

Some visual diagrams would be very useful, to better follow along with your explanation

@jordanhasnolife5163 Год назад

Yep, am trying to be better about visualizations in my new series

@pawandeepb5967 Год назад

Very nice videos! awesome work !

@xmnemonic Год назад

easy to listen to and follow, thanks for making this

@geekwithabs 16 дней назад

At this point, based on that rad intro, I have to ask: Have you considered a role in Hollywood? 😉

@jordanhasnolife5163 16 дней назад

They send me linkedin DMs sometimes asking me to be an underwear model

@cambriandot3665 Год назад

12:05 Job run: Distributed locks, heartbeats, retries, fencing tokens 15:30 More than once runs 16:28 Recurring jobs

@shivujagga Год назад

So helpful!!!

@zy3394 27 дней назад

why does the zookeeper has no arrow outwards ? should it be notifying the db the status change of the tasks? like updating the task status not complete/not complete/ etc.

@jordanhasnolife5163 25 дней назад

Probably just because I forgot to include it in the diagram.

@silentsword9518 Год назад

This video on Job Scheduler is by far the best I've come across on RU-vid. Thank you for creating it! I have a question though: it seems here a lot of effort is made to make sure the "exactly-once" semantics, by doing retry, and having Zookeeper as well as the claim service. Would that work be eased a bit if we use Kafka? My understanding is that Kafka has better support for "exactly-once" and also uses Zookeeper internally.

@jordanhasnolife5163 Год назад

Yeah definitely, I think though that maybe for the sake of the interview it's worth breaking that down

@zhonglin5985 2 месяца назад

How does the Job Claim Service communicate with the ZK? Does it poll ZK once in a while, get the all the running jobs' statuses, and then update our JobStatusTable?

@jordanhasnolife5163 2 месяца назад

You can put something called a "watch" in zookeeper which will notify you when it changes

@prateekaggarwal3305 Год назад

Hi Jordan, how often job schedules will be polled from the Db, is it every second, every min? do we also need to define an SLA for picking the job from the table.

@jordanhasnolife5163 Год назад

I think that's based on the SLA like you said, personally I think something like every 10 seconds is probably reasonable

@akshay-kumar-007 Год назад

Can you elaborate on how the SLA would work in this scenario for scheduling a job?

@allo1579 Год назад

Hey Jordan! I did not get why we need a lock here? If we enqueue a task into SQS, only one consumer will pick it up anyway (I think SQS takes care of concurrency here) and for the duration of the execution we can hide the task in the queue. Also, what happens to a task in the queue? Does worker removes it from the queue or makes it invisible for the duration of execution?

@jordanhasnolife5163 Год назад

Locks are important because tasks may be put in the queue again if the system thinks that it failed to execute (e.g. there is a timeout that is exceeded) - yes once a task is removed from a queue it won't be removed again, however like I mentioned it could be re-enqueued if we accidentally think that it has failed

@allo1579 Год назад

@@jordanhasnolife5163 oh, that makes sense! And what about a task in the queue? A task can take very long to execute, so I assume make it invisible in the queue is not really an option? Does executor remove it from the queue? In which case, what if it dies, who re-queues the task?

@aritraroy3493 Год назад

I didn't know you were chill like that either 😫

@jordanhasnolife5163 Год назад

Listen bro if you didn't know I'm pretty damn chill

@aritraroy3493 Год назад

@@jordanhasnolife5163 Left the freezer open 😨

@user-ke8bx3nw6o Год назад

Hey @jordan thank for the great video on scheduler design. I have a small query what will happend if we run multiple consumers for the service that will be polling data from DB and pushing it to queue? For scalability we may need to run multiple consumers and there is probability that jobs will get duplicated in queue.

@jordanhasnolife5163 Год назад

If our database uses transactions we wouldn't have to worry about this, each consumer could just mark a row as "being uploaded to queue" before they attempt to upload it and other consumers won't touch it if that happens

@rajatbansal112 Год назад

I think data schema can be better. We can have job table which contains jobid,name,cron expression etc. There will be another table also which is job_execution table which will maintaine every execution of job.

@jordanhasnolife5163 Год назад

Seems reasonable to me

@sanampreet3045 Год назад

Great video ! just a small question . When a consumer node dies (stops sending heartbeats , how do we mark job status as failed , is zookeeper holding info that which consumer node is running which job id ? )

@jordanhasnolife5163 Год назад

Yes because to start claiming a job a consumer must grab the corresponding job lock in zookeeper.

@user-vv8fw4fj5i 9 месяцев назад

Looks like there's overlapping work between Job claim service and Zookeper for me, can zookeeper also do the job "Job claim service" does?

@jordanhasnolife5163 9 месяцев назад

Assuming that you mean the distributed locking part, then yeah I think so

@julianosanm 6 месяцев назад

How would we differentiate if the job timed out or is just taking long to execute? How can we prevent it from running twice or even indefinitely? Would it make more sense to use a log based queue and let it take care of retries?

@jordanhasnolife5163 6 месяцев назад

To be honest, the challenging part of distributed computing is that you can never truly know. Networks aren't perfect and so nothing is certain, jobs can complete years after in theory. But, as long as you set a reasonable timeout, and make your jobs idempotent, it's ok! Using a log based queue is totally fine too, but it would still have to use timeouts somewhere

@desltiny2884 9 месяцев назад

FIRST 30 SECONDS HAHAHA THE BEST

@tavvayaswanth Год назад

If we maintain some state in database like submitted, queued, running, success, failed, we don't need to have any distributed lock on a job, your enqueing service will only poll for states which are submitted & running for so long let's say & failed ones, and all of it can be done in a serializable isolation level in MySQL as we have opted for it the first place.

@jordanhasnolife5163 Год назад

While I agree that the majority of the time, this ought to work, ACID properties aren't enough because our SQL database could go down, and unless it has strong consistency (not recommended for performance reasons/network partition tolerance), it may be possible that a claimed job may not seem claimed in the database replicas. Ultimately, we will need some sort of consensus here.

@tavvayaswanth Год назад

@@jordanhasnolife5163 Agreed on the database could go down part, but this is where many master slave systems(hbase for example) use consensus to elect the right master and hence we will get strong consistency. Theoretically both of our solutions has to use consensus in anyway just that you are having a distributed lock service separately. Got it. By the way your videos are great, Way to go!

@rishindrasharma7278 Год назад

7:04 nice job ;)

@champboy Год назад

What if these jobs had different priorities and we had to change the priority of a job at any point ? (Mainly concerned about when priority changed while its in the queue) For longer running jobs staying in the old priority queue might not be an option

@jordanhasnolife5163 Год назад

A bit confused here when you say the queue. We could index our SQL table by priority, or we could shard multiple tables with priority. Once it's in the queue it's going to be run more or less - perhaps you could do some weird type of in memory heap but that seems a bit extra

@ArifSiddiquee 5 месяцев назад

Thanks for the excellent video. I have couple of questions. How are job ids created? Are they globally unique? When a recurring job gets another entry in the metadata db, do they get different id? How do client gets status of recurring jobs? Should there be a different db to store statuses of previous runs?

@jordanhasnolife5163 5 месяцев назад

Yeah I think just creating a particular job run with a UUID is fine. Somebody else in the comments here suggested using a "JobExecutions" table which tracks the status of completed jobs as opposed to scheduled ones, I think that would work nicely here.

@valty3727 Год назад

6:10 what is it about the message queue that doesn't allow us to get any information about the job other than 'run' or 'not run'? admittedly my knowledge of message queues is kind of shaky but couldn't we configure a log-based message broker to give us info other than 'run' or 'not run'? also if you want another video idea, system design of a doordash/grubhub type app would be pretty cool!

@jordanhasnolife5163 Год назад

I'm a bit confused what you mean here - we're just placing the jobs themselves in the message queues. We keep track of the status of each job in a database so that we can request the status from a variety of other components. Sure, a message broker knows which jobs were sent to consumers, but that doesn't mean they were run successfully, and the message broker has no way of knowing this. As for the doorash point, I'd just check out my design of Uber, they're basically the same :)

@valty3727 Год назад

@@jordanhasnolife5163 got it, thanks!

@abhishekmishraji11 Год назад

Hey Jordan , can uoi please make a video on collaborative editing tools like coderpad, google doc, google sheets. Actually I guess codepad would be a super set of google doc so you can choose coderpad over google doc while designing. Thanks,

@jordanhasnolife5163 Год назад

Did that already

@abhishekmishraji11 Год назад

@@jordanhasnolife5163 Thanks!

@vitaliizinchenko556 3 месяца назад

Thank you for the content. One question: what if we want to schedule jobs based on job’s resource consumption requirements and availability of resources on workers. How would you change your design?

@jordanhasnolife5163 3 месяца назад

I think that the message broker could itself maintain some internal state (or have consumers go through a proxy) which keeps track of how many jobs each has run and perhaps their hardware capabilities (maybe stored in zookeeper). Essentially a load balancer lol.

@niranjhankantharaj6161 Год назад

Thanks for the Great video! If zookeper stops receiving heartbeats, "we can go ahead and updated the metadata db" Curious , who would update the metadata db? Is it a) Zookeper that goes ahead and updates the metadata db? If so, is that feasible with Zookeepers capabilities for us to add such a custom logic? or b) Zookeper performs failover where it creates another worker node and has it restart this job ? Also, since zookeper will help claim service acquire distributed locks using fencing token, why do we still need ACID properties if SQL DB - why should we not use no-sql for metadata db?

@jordanhasnolife5163 Год назад

Fair point! I think a couple of servers that poll zookeeper for outages and restart their jobs would do it

@niranjhankantharaj6161 Год назад

@@jordanhasnolife5163 Any example design or literature that shows this design (polling zookeper for outages and implements custom logic with failover) ? I believe this is very critical, and if left unaddressed leaves the fault tolerance not solved

@niranjhankantharaj6161 Год назад

Looks like apache curator has some "recipes" that can be used when persistant nodes fails which can be used here. Also, curator can be used as a client with zookeper to acquire distributed locks

@jordanhasnolife5163 Год назад

@@niranjhankantharaj6161 I'll do a better job addressing this in the remake. You have many options though - for example a cron job on the status table that eventually sets job status back to "not started" after a certain amount of time that the job has yet to be completed. It's certainly not trivial, but it's not overly complex either

@jordanhasnolife5163 Год назад

@@niranjhankantharaj6161 Good to know, I'll take a look into curator!

@jayshah5695 Год назад

Are there any open source or commercial example that solve this problem? helps to understand the problem better.

@jordanhasnolife5163 Год назад

Look up Dropbox atf

@user-se9zv8hq9r Год назад

song? in b4 darude - sandstorm

@jordanhasnolife5163 Год назад

Lol it's some no copyright edm bs I gotta go find it haha

@wil2200 Месяц назад

Solid side job (id =14)

@zhonglin5985 2 месяца назад

At ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-WTxG5880EH8.html, why long polling instead of regular polling?

@jordanhasnolife5163 2 месяца назад

You end up putting a lot of load on your system that you may not necessarily need to.

@dind7926 11 месяцев назад

hey Jordan, great video as always. Have a couple of questions: - Instead of using Enqueuing service could which polls jobs every minute, could we instead just add an event stream on the DB and just do filtering within the stream where we only take a look at the jobs that need to be run? - Not sure I got the argument about using in-memory queue, could you add more context on why we decided to do that instead of log-based queue?

@jordanhasnolife5163 11 месяцев назад

1) We could but that's effectively just polling and I think defeats the purpose of using the stream 2) We don't care about the order in which jobs are run and want to maximize throughout, so an in memory queue with many consumers is more useful to us than a log based queue with a single consumer per partition

@erythsea Год назад

that intro tho

@nikkinic112 Год назад

Why MqSql for the Job Scheduler? Why not Nosql?

@jordanhasnolife5163 Год назад

We need transactions in our db table or else we could have write conflicts on a single node and jobs will get lost

@tunepa4418 Год назад

Good intro lol

@jordanhasnolife5163 Год назад

Why thank you, it was certainly out there

@akarshgajbhiye1289 4 месяца назад

Jordan is clearly a man of culture ,

@Lantos1618 Год назад

jordan make a discord channel baka >,

@jordanhasnolife5163 Год назад

Definitely something I'm considering, I'm stretched a little too thin to be on there consistently atm so will let you know if I change my mind!

@andreystolbovsky Год назад

We don’t care about order of the jobs and we want an in-memory broker, so let’s pick Kafka. Wat. Wat a strange statement in otherwise interesting video.

@jordanhasnolife5163 Год назад

Probably a misstatement on my part - meant sqs or rabbit mq

@jordanhasnolife5163 Год назад

Actually it seems at 11:10 I said to not use kafka

@andreystolbovsky Год назад

Listened to that again - you’re right, I’m wrong. I felt it!

@user-se9zv8hq9r Год назад

love the farting part. are you going to start selling your farts anytime soon?

@jordanhasnolife5163 Год назад

Should I make a Patreon or an only fans?

@mnchester Год назад

Only Farts

@jordanhasnolife5163 Год назад

@@mnchester brb building that

@justicedoesntexist1919 8 месяцев назад

how crass is this man? Such people pass googlyness round and get into google? Do people really like to work with such people with questionable character?

@jordanhasnolife5163 8 месяцев назад

Nope they all hate me! I'm literally incapable of cursing during the interview round!

@justicedoesntexist1919 4 месяца назад

So basically, interview process at Google is broken and there are false positives all the time. Got it!@@jordanhasnolife5163

@utkarshgupta2909 Год назад

Jordan dont you think we should be having a queue between job submission service and SQL db?

@jordanhasnolife5163 Год назад

I don't think it's necessary since a job submission is just adding one row to the database.

@utkarshgupta2909 Год назад

@@jordanhasnolife5163At what scale should we have queue there? I mean at what transaction per second, SQL needs a queue

@jordanhasnolife5163 Год назад

@@utkarshgupta2909 Can't speak to exact TPS, but I think a good rule of thumb for a queue is when something that is being uploaded needs to be sent to multiple places or there is a lot of processing that eventually has to be done on it

Далее

Count Unique Active Users Design Deep Dive with Google SWE! | Systems Design Interview Question 26

16:04

Count Unique Active Users Design Deep Dive with Google SWE! | Systems Design Interview Question 26

Просмотров 5 тыс.

20: Distributed Job Scheduler | Systems Design Interview Questions With Ex-Google SWE

30:28

20: Distributed Job Scheduler | Systems Design Interview Questions With Ex-Google SWE

Просмотров 9 тыс.

24 ЧАСА СТОЮ НА ГВОЗДЯХ! ЧТО СЛУЧИЛОСЬ С НОГАМИ? #нонале

00:39

24 ЧАСА СТОЮ НА ГВОЗДЯХ! ЧТО СЛУЧИЛОСЬ С НОГАМИ? #нонале

Просмотров 878 тыс.

❌Чего только нет в сумочках у девушек🤦🏻‍♂️ #pov #story

01:00

❌Чего только нет в сумочках у девушек🤦🏻‍♂️ #pov #story

Просмотров 397 тыс.

Удивительно, что ЭТО готовится ИЗ КАБАЧКОВ! Лучше оригинала! КЕТЧУП и АДЖИКА из кабачков!

10:10

Удивительно, что ЭТО готовится ИЗ КАБАЧКОВ! Лучше оригинала! КЕТЧУП и АДЖИКА из кабачков!

Просмотров 128 тыс.

Как можно наломать дров и не поймать рыбы.

42:39

Как можно наломать дров и не поймать рыбы.

Просмотров 200 тыс.

Distributed Message Broker Design Deep Dive with Google SWE! | Systems Design Interview Question 27

19:19

Distributed Message Broker Design Deep Dive with Google SWE! | Systems Design Interview Question 27

Просмотров 7 тыс.

Distributed Job Scheduler - System Design | Basic to Expert | On Whiteboard

22:08

Distributed Job Scheduler - System Design | Basic to Expert | On Whiteboard

Просмотров 11 тыс.

"Building a Distributed Task Scheduler With Akka, Kafka, and Cassandra" by David van Geest

36:39

"Building a Distributed Task Scheduler With Akka, Kafka, and Cassandra" by David van Geest

Просмотров 32 тыс.

Distributed Locking Design Deep Dive with Google SWE! | Systems Design Interview Question 24

19:51

Distributed Locking Design Deep Dive with Google SWE! | Systems Design Interview Question 24

Просмотров 10 тыс.

Job Scheduler: System Design Interview with a senior FAANG Engineer

1:04:11

Job Scheduler: System Design Interview with a senior FAANG Engineer

Просмотров 8 тыс.

Distributed Metrics/Logging Design Deep Dive with Google SWE! | Systems Design Interview Question 14

21:32

Distributed Metrics/Logging Design Deep Dive with Google SWE! | Systems Design Interview Question 14

Просмотров 13 тыс.

JobRunr - Easy Distributed Job Scheduling by Ronald Dehuysser @ Spring I/O 2022

52:58

JobRunr - Easy Distributed Job Scheduling by Ronald Dehuysser @ Spring I/O 2022

Просмотров 9 тыс.

21: Distributed Locking | Systems Design Interview Questions With Ex-Google SWE

28:13

21: Distributed Locking | Systems Design Interview Questions With Ex-Google SWE

Просмотров 9 тыс.

Top K Leaderboard Design Deep Dive with Google SWE! | Systems Design Interview Question 19

20:27

Top K Leaderboard Design Deep Dive with Google SWE! | Systems Design Interview Question 19

Просмотров 15 тыс.

How to answer any system design interview question?

1:37:51

How to answer any system design interview question?

Просмотров 8 тыс.

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

7:20

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

Просмотров 27 тыс.

Впихнуть НЕвпихуемое, 2ой популярный способ убийства ноутбуков MSI GF63 Thin9 и самый лучший сервис

21:31

Впихнуть НЕвпихуемое, 2ой популярный способ убийства ноутбуков MSI GF63 Thin9 и самый лучший сервис

Просмотров 56 тыс.

ПОЧЕМУ ГЕЙМЕРЫ ТАК НЕ ЛЮБЯТ ИЗОГНУТЫЕ МОНИТОРЫ?

0:33

ПОЧЕМУ ГЕЙМЕРЫ ТАК НЕ ЛЮБЯТ ИЗОГНУТЫЕ МОНИТОРЫ?

Просмотров 934 тыс.

Smart Home👍👍Cool gadget I New Gadget #xuhuong #kitchen #review #dogiadung #goodthing

0:32

Smart Home👍👍Cool gadget I New Gadget #xuhuong #kitchen #review #dogiadung #goodthing

Просмотров 5 млн

iPhone 16 с инновационным аккумулятором

0:45

iPhone 16 с инновационным аккумулятором

Просмотров 7 млн

Опыт использования Мини ПК от TECNO

1:00

Опыт использования Мини ПК от TECNO

Просмотров 703 тыс.

После ввода кода - протирайте панель

0:18

После ввода кода - протирайте панель

Просмотров 1,1 млн