5: Netflix + YouTube | Systems Design Interview Questions With Ex-Google SWE

Jordan has no life

Подписаться 39 тыс.

Просмотров 18 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Please reach out on LinkedIn for the super secret only fans version of JHNL

Наука

Опубликовано:

15 дек 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 114

@laserbam 6 месяцев назад

Thanks for doing this series! A few days ago, I signed my L5 offer at Google, so your system design videos (and slide decks) came in clutch

@jordanhasnolife5163 6 месяцев назад

Hell yes dude, extremely proud of you, keep killing it!!

@idiot7leon 2 месяца назад

Brief Outline 00:01:04 Problem Requirements 00:01:46 Capacity Estimates 00:02:52 Video Streaming Intro 00:04:00 Video Chunking 00:05:40 Chunking Advatages 00:07:09 Database Tables - Subscribers 00:09:39 Database Tables - User Videos, Users, Video Comments 00:11:33 Database Tables - Video Chunks 00:12:45 Database Choices 00:14:45 Video Uploads 00:15:57 Video Uploading - Broker 00:16:46 Video Uploading - Broker 00:18:51 Video Uploading - Chunks 00:20:27 Video Uploading - Chunk Storage 00:22:32 Video Uploading - Aggregation 00:26:41 Video Uploading - Streaming Datamodels 00:28:37 Video Uploading - Flink 00:31:15 Video Uploading - Flink Continued 00:33:53 Video Uploading - Search 00:34:59 Search Index - Partitioning 00:37:17 Search Index - Partitioning Continued 00:38:57 Search Index Uploads 00:40:21 Final Diagram - Netflix/RU-vid Thanks, Jordan~

@allenxxx184 3 месяца назад

Your channel deserves at least 1M subscribers. Most high-quality system design video!!!

@jordanhasnolife5163 3 месяца назад

Best designs, best ass

@MithunSasidharan1989 6 месяцев назад

Thank you for continuing to do this. Its goldmine for engineers preparing for interviews : )

@wensongliu5058 Месяц назад

Much appreciation to you, Jordan. This video covers so many detailed components and processes going back and forth, I already watched this video for many times and it's really helpful!

@muven2776 5 дней назад

This is a great video f***ing Indeed! Got instant High & confidence going through this videos. To understand jordan videos my suggestion is to go through Jordans system design 2.0 playlist 0. DB fundamentals (0 to 15 videos) 1. Replication (16 to 24 videos) 2. Stream Processing videos and Flink(42 to 45) After understanding above video - system design videos is like a cake walk. Note down the terms which you come across like zoo keeper, elastic search and go back to 2.0 playlist and comeback to this series. Note down the technical terms which he mentions like. "Split brains" "Read Repairs" "Anti entrophy" Keep using these terms in the interview for showing that you know what is distributed system :D

@jordanhasnolife5163 5 дней назад

Nice! I like it!

@sauravsingh5663 3 месяца назад

This is exactly what I was looking for. Love how you uncover the right level of detail where it is necessary. Great work !!

@dosya6601 17 дней назад

@vkchgc 15 дней назад

You really are doing the best system design videos I’ve ever seen ! Keep up the great work

@rahulnath9655 6 месяцев назад

This one is so dense and detailed, thanks man. I feel like I really understand these systems now.

@jordanhasnolife5163 6 месяцев назад

Woooo! Thanks Rahul!

@ky747r0 3 месяца назад

42:36 Jordan man its been a long way... from your super wobbly handwriting in the 1st concepts video to this super beautiful amazing handwriting. And as always quality content!!!

@jordanhasnolife5163 3 месяца назад

Lmao making big moves out here

@Luzkan 4 месяца назад

Congratz on 21k Jordan! Its 5th video for me so far and I'm amazed every single time with the details u are manage to dwell into. For how long on average do you think about the whole system before starting the video itself (lets say without refining it to a someting presentable, just thought mapping it out)? 14:39 / 41:40 - (In my design channel_id is the same thing as user_id) I'm wondering why do you suggest to shard on channel_id + video_id, rather than just video_id? I don't see how having close comments from other videos from a given user (channel) is helpful. 🤔 24:49 - What happens if RabbitMQ dies after successful upload to S3 and just after messages have been put to the que with metadata (i know there is option for durable ques and persistent messages, but is that the way to go)? Btw, do you know how Discord handled the casual dependencies (relationships between messages like msg to msg replies) with Cassandra?

@jordanhasnolife5163 4 месяца назад

Hey! I'm basically remaking all of these videos right now, so I don't have to think about them for too long. I mainly just re-watch my old video on it and then try to decide if what I did last time was stupid haha. 14:39 - yup, typo on my part nice catch. 24:49 - Ideally we would have multiple replicas of rabbit mq so that if the leader dies the follower can take over and we can proceed as normal. I do not know the answer regarding discord! Maybe version vectors, maybe they always write to the same leader for a given parent comment Id, maybe quorums! I'd have to look into it.

@kword1337 6 месяцев назад

Thanks for another banger dude! For complicated stuff like video aggregation, are you getting your ideas from white papers? Those level of designs seems beyond Designing Data Intensive Applications?

@jordanhasnolife5163 6 месяцев назад

Well I don't feel like DDIA is ever super opinionated on how to design things in particular. That being said, real time aggregation using stream processing seems to be something used across many systems and it also handles pretty much all failure scenarios for us, hence the reason I keep abusing it haha

@Randomguu 6 месяцев назад

Wonderful series, cannot stop watching. Just one question on something which is bugging me- I heart this suggestion in a few of the other videos as well, how do you decide that sql db will be better as we have a read heavy system. I understand the btree vs lsm tree point but nosql scales better hence will have less locking on a single contention to a single sql node ( even if we have master slave for reads - still scale poorly no?). I think lsm vs btree is merely theoritical discussion rather than having pratical application here

@jordanhasnolife5163 6 месяцев назад

You say "NoSQL" scales better - what makes you say this? That's really only the case when we're running a bunch of distributed joins, which we aren't doing in any of this reads

@xRuneGunx 5 месяцев назад

In 41:31 you mentioned using Cassandra increases write throughput. However, doesn't Cassandra use a Leaderless replication model such that write availability is increased? I was under the impression that multiple leader replication increases write throughput due to its nature of processing events in parallel. Can you clear up my confusion? Thanks for the video

@jordanhasnolife5163 5 месяцев назад

Yes sorry and good catch here. Cassandra can be run in multiple different configurations: one with quorum consistency, and another where writes just need to hit one node. I'm mainly referring to the latter, which is effectively multi leader replication.

@siddharthgupta6162 5 месяцев назад

Thanks for the video, Jordan. Awesome content as always. Is there any difference between streaming vs chunking? I read somewhere that streaming is an error-prone process so one should prefer chunking over it - but there was no explanation on it. Any thoughts on this?

@jordanhasnolife5163 5 месяцев назад

Yeah to tell you the truth no clue - sounds like some guy spewing some bs as per usual with 99% of systems design videos lol

@siddharthgupta6162 5 месяцев назад

@@jordanhasnolife5163 lol sounds about right

@dmitrigekhtman1082 5 месяцев назад

The upload and processing pipeline could include lots of different jobs with complicated interdependencies, with the S3 upload stage as one of the first steps. Possibly, a general-purpose workflow orchestration framework (something like Temporal, maybe?) could help coordinate all of it.

@jordanhasnolife5163 5 месяцев назад

Agreed, and I imagine that IRL they do probably have something like this!

@user-of5je3sp9n 5 месяцев назад

You should do a video on workflow orchestration :D

@indraneelghosh6607 5 месяцев назад

Hi Jordan. Had a few questions related to the video upload flow. Could you please explain why you chose RabbitMQ over Kafka while uploading the metadata? Also, there may be times when there may be a spike in the amount of videos being uploaded particularly in the case of a RU-vid-like system. I would expect video uploading on youtube would have a rather irregular traffic pattern as compared to a streaming platform like Netflix. Any ideas on how to tackle these spikes without manual intervention?

@jordanhasnolife5163 5 месяцев назад

To be honest, I do think that the uploading on RU-vid would be more regular than you think. You've got people in every timezone. But yeah, I guess the way you'd do it is just have your consumers that are doing the encoding be part of some hadoop cluster that also is performing other work in the meantime, and as more jobs come in for uploads you can kill whatever jobs those nodes are currently doing and use them for uploads. For your first question, RabbitMQ is going to allow me to use a fan out design such that I don't need a bunch of different partitions (one per consumer) as I would with kafka. I don't care about message ordering at all here, so a fan out is fine.

@meenalgoyal8933 3 месяца назад

Hey Jordan, I am wondering how the design might change for audio streaming service like Spotify. I think a lot might remain same as youtube, but 2 major things: 1. Do you think we need to break audio file into chunks? Sure we can benefit from parallel uploading and getting one chunk at a time for streaming but audio files are lighter than video. 2. What kind of processing might be required for each audio file chunk?

@jordanhasnolife5163 3 месяца назад

Hey! I think 99% of it is probably going to be the same. You'd probably have different bit rates for streaming the audio if you have a worse connection, which is the processing involved. Maybe you wouldn't need chunking since as you mentioned the files are much smaller in size.

@jieguo6666 8 дней назад

Hey Jordan! Thanks for the video! If we use DDB we can use GSI of DDB so we seems don't need CDC. I'm curious is cassandra+CDC better than DDB, or it's a personal preference thing?

@jordanhasnolife5163 8 дней назад

If it's an eventually consistent global secondary index then I'd say personal preference. If it needs a two phase commit to stay completely consistent with the primary that seems like a pretty big difference then

@nirajvora9314 6 месяцев назад

Don't stop making videos bro. Your content is unique and effective.

@jordanhasnolife5163 6 месяцев назад

Not planning on it brothq

@user-wj1wy6ph5q 6 месяцев назад

Awesome design 🙇

@ariali2067 3 месяца назад

Again, sorry same question caught me again and again. Is search index basically building a new table or basically a secondary index to existing user video table? I already convinced myself that it's a secondary index on top of existing tables, but then this video it seems that we are creating a new table (with some denormalized data from user video table) -> if this is the case (create a new table) -> why we need (user id, video id) as partition key here? Why we cannot use term as partition key such that for a given term search all the results are on the same node for faster read speeds? This really bothered me.. really appreciate if you can help clear my confusion here, thanks again!

@jordanhasnolife5163 3 месяца назад

1) new table 2) too many much data for a given term typically, imagine for "Donald trump"

@alberdgdj1 6 месяцев назад

Hi Jordan, thanks for your videos they are of a huge value. I wonder if you could do a video about calculating BigO complexity with some exercises, that would be really helpful. Thanks mate!

@jordanhasnolife5163 6 месяцев назад

I appreciate that! I can do this, however it realistically would be a while before I get to it, just due to the fact that I'm mainly trying to focus on systems design. That being said, there are many good resources on the internet for how to calculate this type of thing!

@ravi72munde 5 месяцев назад

For processing chunks, is it possible be to use Kafka + spark so each spark job handles single video but processes it’s chunks on multiple workers and at the end marks the job completed when all chucks are processed. Making keeping of state of the video’s chunks redundant.

@jordanhasnolife5163 5 месяцев назад

A couple of concerns here that you'd have to address: 1) how do we know when to trigger the spark job? 2) You're triggering a lot of spark jobs haha In practice, this may work! I think we'd have to try it out.

@ravi72munde 5 месяцев назад

Good point! How about if you could use Kafka queue to queue jobs. Message would just contains the videoID which has chunks ready to process. A consumer could act as a spark streaming(master) node. Picks available message, fetches all the chunks_ids/fileurls for that video and distributes chunks to worker nodes. Once all chunks are processed the master node would know and mark the video as complete. As an advantage it’ll be easy to track which video failed rather than chunks.

@dinar.mingaliev 6 месяцев назад

Hi Jordan, thank you so much for for keeping us educated and sharing your ideas in system design. Short question: dont we also need to add chunk processor, once a user uploaded a video into temporary S3 or DFS, the service splits it into chunks. And meanwhile one more question: if we have single leader replication + partitions in Cassandra, will it work with comment editing right? And also we need a service to create a user feed :)

@dinar.mingaliev 6 месяцев назад

also I guess for insert, updated and delete operation on a single row are atomic, isolated and durable in Cassandra and assuming that the same user edits its comments - there should not be a problem with eventual consistency. what do you think man? :)

@jordanhasnolife5163 6 месяцев назад

Thanks! I had envisioned the user's client breaking the file into chunks. Secondly, I'd agree that edits of comments are no issue if we use single leader replication, but for multi leader replication they definitely could be!

@xiaoyinqi7296 4 месяца назад

Thanks for the video, Jordan, very impressive. want to understand the reason using Flink here, I know Flink is a streaming processing tool. I believe we want to confirm if the transcoding of all the chunks is done. my thought is to use chunk db table to mark each chunk's status.

@jordanhasnolife5163 4 месяца назад

You can definitely use a chunk db. However, note that this means: 1) You need to make an additional network request to the chunk db every time 2) That request can fail, how do you ensure that we eventually write it there?

@aforty1 2 месяца назад

Liked and comment for the also! Thank you!

@adithyabhat4770 5 месяцев назад

Thanks Jordan!

@isaacneale8421 6 дней назад

I like your idea of data locality in a DB for each of the processed chunks. But I don’t know if I understand if it works. When thinking about a single machine (say a personal laptop) reading from disk, the video ought to be stored as continuously as possible to ensure good data locality and no disk jumping. Makes sense. But when talking about a distributed service, I can’t see how this helps. As I understand a disk, it can be only reading in one location at once. There might be multiple physical hard drives on one machine though. Anyway, so let’s say I am watching a youtube video and I grab Chunk1 from the DB. Great. Chunk2 is next, which i’ll request in 5-10 seconds. But what happens if someone else is watching a video partitioned on the same DB shard. And they request their chunkXYZ. The disk jumps to their spot, then back to mine when i request chunk2. So it seems like making the distributed DB have good data locality can break down quite easily with concurrent requests. Hopefully, however, most videos are read from the CDN which would be much faster since it cache is in memory. But that’s a lot of expensive memory for caching all videos, so maybe that is on disk partially too, which I guess would have the same problem. Any thoughts? I suppose good data locality doesn’t hurt in the case where my sequential reads are not the systems sequential reads. So you might as well try to have good data locality.

@isaacneale8421 6 дней назад

Oops. I just rewatched and realized that you had an S3 location in this DDB. The data locality was for range queries to fetch the next X many chunk locations while buffering. This makes a lot of sense.

@jordanhasnolife5163 4 дня назад

Yup, range queries is what you're looking for!

@9527-ljc 5 месяцев назад

Thanks, this is great content. For entry lvl sde, which part should we focus more in SD interview?

@jordanhasnolife5163 5 месяцев назад

If you're looking for junior roles, I'd honestly just keep grinding leetcode haha. Otherwise, I'd say that the whole video is still relevant. Can't hurt to learn!

@college7290 5 месяцев назад

Real treasure! Thank you. What resources did you use to learn these concepts? I know your knowledge is not out of books, but based on years of hard work and experience. how I can start learning these concepts myself? What can I do to be knowledgeable like you in next 5~10 years?

@jordanhasnolife5163 5 месяцев назад

Just reading haha, I'm nothing special! You'd be surprised how much you can learn by looking at "Uber system design" from reputable sources (their site and not RU-vidrs)

@saurabhmittal6947 20 дней назад

hey jordan, I have one question.. how is client able to uniquely generate the chunk-id and video-id because here, you are showing that client will be uploading to s3 and then sending that data to upload-service but who is assigning unique-ids to all these entities flowing in our system ?

@jordanhasnolife5163 18 дней назад

The video id can just be some userId + a hash or something. The chunk ID is also basically a hash and just needs to be unique per video id

@calvincruzada1016 6 месяцев назад

Awesome

@ankitagarwal4022 2 месяца назад

@jordanhasnolife5163 Hi Jordan, I have just one question, your processor is transforming the video into a list of transforming videos, it will depend on the number of encodings * resolutions. let's say for example we have 10 encodings and 4 resolutions. it will make it 40. So we have to transform on 1 chunk into 40 and upload into 40 into s3. I assume transforming one chunk to another itself a heavy process. Can you suggest some optimization here? if our event processing fails so we don't have to transform every chunk from the beginning.

@jordanhasnolife5163 2 месяца назад

I'm pretty confused what you mean here - each resolution/encoding is processed independently in tandem already, so if one fails the rest do not fail, feel free to elaborate!

@ankitagarwal4022 2 месяца назад

@@jordanhasnolife5163 what I understand about the flow of data 1. first we are uploading chunks is S3, lets say (c1,c2,c3.....) 2. adding chunk details in broker (rabbitmq) 3. The processor consumes chunk details from the broker let's say C1 and puts a list of transformed (C1R1E1, C1R1E2, C1R1E3, C1R2E1,C1R2E2,C1R2E3)video into the S3 considering (resolutions(R) = 2, encoding(E) = 3 ). and processor also put list details into flink.

@jordanhasnolife5163 2 месяца назад

@@ankitagarwal4022 The only transformation of one chunk to another that we're doing right at the start is creating the list of all of the metadata that we will eventually need to create. So that can all go into rabbit mq, and once it does we can be fairly confident that the chunk will eventually be created downstream because it will only get removed from rabbit mq once the consumer puts the completion message in kafka

@vigneshraghuraman 17 дней назад

once the chunks are uploaded by the user to S3, how does upload service know which chunks to put on the rabbit MQ? is this done via S3 notifications to the upload service?

@jordanhasnolife5163 16 дней назад

The client will upload chunks based on which ones are "new". Then they all go into rabbit mq.

@user-ov6rb6mw8u 3 дня назад

qq: is client to be responsible for chunking the video and upload to s3? or should there be a merchanism to upload the video directly to s3 and have some dedicated backend wokers to chunk it in async fashion?

@jordanhasnolife5163 День назад

Typically you'd want the client doing chunking to avoid having to retry uploading the full video in the event of some failure.

@user-ov6rb6mw8u День назад

Makes sense. I also checked, s3 does provide support of multipart upload of fixed chunks which would be handy here

@truptijoshi2535 Месяц назад

Hi Jordan, can CDC have a single point of failure? If yes, how do we avoid? Also does CDC add extra latency?

@jordanhasnolife5163 Месяц назад

I mean in theory kafka, but I tend to imply that our Kafka cluster has replicas. CDC does make things slower, but I suppose in the cases where I use it I don't actually care (hence why I use it)

@JulianA-rm4ry 2 месяца назад

Thank you Jordan

@JulianA-rm4ry 2 месяца назад

Now i'm only 1/2 screwed

@rakeshvarma8091 2 месяца назад

You Are Awesome Bro!!

@jordanhasnolife5163 2 месяца назад

So are you!

@roshankumar0911 6 месяцев назад

I recently cleared my system design round after watching ur videos..it's so compact & precise. Thank you for making such videos. Can you please mention your linkedin id ?

@jordanhasnolife5163 6 месяцев назад

Glad to hear!! Congrats! www.linkedin.com/in/jordan-epstein-69b017177? If you don't mind, just don't tag me in stuff so that I don't lose my job haha

@roshankumar0911 6 месяцев назад

@@jordanhasnolife5163 Sure, thanks :)

@niapuchun 4 месяца назад

The page at time 2:10th min the last line should say 1 million videos..isn’t it ?

@jordanhasnolife5163 4 месяца назад

yep typo

@rahulrachh3320 3 месяца назад

Video Timestamp: 10:18 Part-1: For the user Videos table, We can omit timestamp as UserId+VideoId make a unique pair and when you get the videos from the table, you get timeStamp and then you sort them and display the videos for a user who uploads videos. Correct me If I am wrong. Part-2: Also, in the Video Comments table, VideoId will be unique so why are we using timestamp along with this. Does this help in getting output in sorted manner ? Thanks :) Edit: Added Video Timestamp

@jordanhasnolife5163 3 месяца назад

1: Definitely doable, however it is easier to keep things pre-sorted by timestamp in the metadata database so that you don't have to sort them on the fly for each read. 2: You answered your own question :). Having a timestamp for comments allows us to easily fetch comments in a pre sorted order, as we can index those comments on timestamp per video.

@rahulrachh3320 3 месяца назад

@@jordanhasnolife5163 Thank you :) I love this series and System Design 2.0. This got me thinking of starting my own series on System Design topics. Maybe one day for sure :)

@rahulrachh3320 3 месяца назад

@@jordanhasnolife5163 Thanks got it. This series and System Design 2.0 are gold. I might even start making videos on similar topics sometime sooner :)

@jordanhasnolife5163 3 месяца назад

@@rahulrachh3320 Just don't take too many of my viewers away from me it's all I've got ;)

@rahulrachh3320 3 месяца назад

@@jordanhasnolife5163 haha, I'll try not to take the viewers ;)

@weijiachen2850 2 месяца назад

How does this guy know all these as a junior engineer? He should be promoted to a staff engineer.

@jordanhasnolife5163 2 месяца назад

Very unclear if I have what it takes for that

@vorandrew 6 месяцев назад

Chunking stuff question... Why would you want to store chunks except in cache? Let's say video is 50Mb, you want save permanently transcoded 3-4 resolutions x 1-2 formats? Petabyte here petabyte there and we are talking about big numbers... If you always can re-create them - no need to store transcodes for video that was last viewed 3 years ago... cache them with last-access timeout set to 1 week for example... Maybe you want to store first chunk for fast access at maximum

@jordanhasnolife5163 6 месяцев назад

Would appreciate if you could elaborate here! While it's true that we could store the entire video file and never deal with any chunks, assuming we originally upload chunks to S3 when first uploading the file we'll always need at least some chunk metadata in our database to load them

@vorandrew 6 месяцев назад

@@jordanhasnolife5163 my guess is like this - we are receiving file of original resolution -> chunking it by 2 sec -> long term storage. Transcode first chunk into 144,240,360,480 etc resolutions (don't store) -> CDN expiration = 1Y last access (just to have fast start experience). Whenever somebody starts to watch video we transcode necessary resolution on the fly from original chunks in parallel and store it in CDN expiration = 1 week. I'm sure sum of transcode speed will be faster than viewing speed so we will make viewing seamless Regarding metadata - as you said during upload we can store all necessary chunking stuff in some nosql db

@vorandrew 6 месяцев назад

Than you for your videos! ❤ after viewing some I can see your designs tend to give out space as FED is printing money 😂

@jordanhasnolife5163 6 месяцев назад

@@vorandrew Ah I see what you're saying here, I think it's one of those things that we'd have to actually try out and see if the latencies would be low enough. We do care a lot more about lowering read latencies here, so I wonder if this would work in practice but it's an interesting thought!

@jordanhasnolife5163 6 месяцев назад

@@vorandrew Haha yeah - my personal philosophy here is to use as much disk space as needed, we could always optimize for cost saving measures in the future! At least for the interview I don't know how often it would come up, but it's possible!

@zhonglin5985 3 месяца назад

At ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-43bB7oSn190.html, another queue is needed to stream total chunk count to Flink. This look a bit redundant to me. Why don't we just include total chunk count as an extra field of events that are sent to RabbitMQ?

@jordanhasnolife5163 3 месяца назад

Totally doable as well, I considered this approach too. I mainly assumed there'd be a lot of other metadata around and didn't wanna bloat the messages.