Much appreciation to you, Jordan. This video covers so many detailed components and processes going back and forth, I already watched this video for many times and it's really helpful!
This is a great video f***ing Indeed! Got instant High & confidence going through this videos. To understand jordan videos my suggestion is to go through Jordans system design 2.0 playlist 0. DB fundamentals (0 to 15 videos) 1. Replication (16 to 24 videos) 2. Stream Processing videos and Flink(42 to 45) After understanding above video - system design videos is like a cake walk. Note down the terms which you come across like zoo keeper, elastic search and go back to 2.0 playlist and comeback to this series. Note down the technical terms which he mentions like. "Split brains" "Read Repairs" "Anti entrophy" Keep using these terms in the interview for showing that you know what is distributed system :D
42:36 Jordan man its been a long way... from your super wobbly handwriting in the 1st concepts video to this super beautiful amazing handwriting. And as always quality content!!!
Congratz on 21k Jordan! Its 5th video for me so far and I'm amazed every single time with the details u are manage to dwell into. For how long on average do you think about the whole system before starting the video itself (lets say without refining it to a someting presentable, just thought mapping it out)? 14:39 / 41:40 - (In my design channel_id is the same thing as user_id) I'm wondering why do you suggest to shard on channel_id + video_id, rather than just video_id? I don't see how having close comments from other videos from a given user (channel) is helpful. 🤔 24:49 - What happens if RabbitMQ dies after successful upload to S3 and just after messages have been put to the que with metadata (i know there is option for durable ques and persistent messages, but is that the way to go)? Btw, do you know how Discord handled the casual dependencies (relationships between messages like msg to msg replies) with Cassandra?
Hey! I'm basically remaking all of these videos right now, so I don't have to think about them for too long. I mainly just re-watch my old video on it and then try to decide if what I did last time was stupid haha. 14:39 - yup, typo on my part nice catch. 24:49 - Ideally we would have multiple replicas of rabbit mq so that if the leader dies the follower can take over and we can proceed as normal. I do not know the answer regarding discord! Maybe version vectors, maybe they always write to the same leader for a given parent comment Id, maybe quorums! I'd have to look into it.
Thanks for another banger dude! For complicated stuff like video aggregation, are you getting your ideas from white papers? Those level of designs seems beyond Designing Data Intensive Applications?
Well I don't feel like DDIA is ever super opinionated on how to design things in particular. That being said, real time aggregation using stream processing seems to be something used across many systems and it also handles pretty much all failure scenarios for us, hence the reason I keep abusing it haha
Wonderful series, cannot stop watching. Just one question on something which is bugging me- I heart this suggestion in a few of the other videos as well, how do you decide that sql db will be better as we have a read heavy system. I understand the btree vs lsm tree point but nosql scales better hence will have less locking on a single contention to a single sql node ( even if we have master slave for reads - still scale poorly no?). I think lsm vs btree is merely theoritical discussion rather than having pratical application here
You say "NoSQL" scales better - what makes you say this? That's really only the case when we're running a bunch of distributed joins, which we aren't doing in any of this reads
In 41:31 you mentioned using Cassandra increases write throughput. However, doesn't Cassandra use a Leaderless replication model such that write availability is increased? I was under the impression that multiple leader replication increases write throughput due to its nature of processing events in parallel. Can you clear up my confusion? Thanks for the video
Yes sorry and good catch here. Cassandra can be run in multiple different configurations: one with quorum consistency, and another where writes just need to hit one node. I'm mainly referring to the latter, which is effectively multi leader replication.
Thanks for the video, Jordan. Awesome content as always. Is there any difference between streaming vs chunking? I read somewhere that streaming is an error-prone process so one should prefer chunking over it - but there was no explanation on it. Any thoughts on this?
The upload and processing pipeline could include lots of different jobs with complicated interdependencies, with the S3 upload stage as one of the first steps. Possibly, a general-purpose workflow orchestration framework (something like Temporal, maybe?) could help coordinate all of it.
Hi Jordan. Had a few questions related to the video upload flow. Could you please explain why you chose RabbitMQ over Kafka while uploading the metadata? Also, there may be times when there may be a spike in the amount of videos being uploaded particularly in the case of a RU-vid-like system. I would expect video uploading on youtube would have a rather irregular traffic pattern as compared to a streaming platform like Netflix. Any ideas on how to tackle these spikes without manual intervention?
To be honest, I do think that the uploading on RU-vid would be more regular than you think. You've got people in every timezone. But yeah, I guess the way you'd do it is just have your consumers that are doing the encoding be part of some hadoop cluster that also is performing other work in the meantime, and as more jobs come in for uploads you can kill whatever jobs those nodes are currently doing and use them for uploads. For your first question, RabbitMQ is going to allow me to use a fan out design such that I don't need a bunch of different partitions (one per consumer) as I would with kafka. I don't care about message ordering at all here, so a fan out is fine.
Hey Jordan, I am wondering how the design might change for audio streaming service like Spotify. I think a lot might remain same as youtube, but 2 major things: 1. Do you think we need to break audio file into chunks? Sure we can benefit from parallel uploading and getting one chunk at a time for streaming but audio files are lighter than video. 2. What kind of processing might be required for each audio file chunk?
Hey! I think 99% of it is probably going to be the same. You'd probably have different bit rates for streaming the audio if you have a worse connection, which is the processing involved. Maybe you wouldn't need chunking since as you mentioned the files are much smaller in size.
Hey Jordan! Thanks for the video! If we use DDB we can use GSI of DDB so we seems don't need CDC. I'm curious is cassandra+CDC better than DDB, or it's a personal preference thing?
If it's an eventually consistent global secondary index then I'd say personal preference. If it needs a two phase commit to stay completely consistent with the primary that seems like a pretty big difference then
Again, sorry same question caught me again and again. Is search index basically building a new table or basically a secondary index to existing user video table? I already convinced myself that it's a secondary index on top of existing tables, but then this video it seems that we are creating a new table (with some denormalized data from user video table) -> if this is the case (create a new table) -> why we need (user id, video id) as partition key here? Why we cannot use term as partition key such that for a given term search all the results are on the same node for faster read speeds? This really bothered me.. really appreciate if you can help clear my confusion here, thanks again!
Hi Jordan, thanks for your videos they are of a huge value. I wonder if you could do a video about calculating BigO complexity with some exercises, that would be really helpful. Thanks mate!
I appreciate that! I can do this, however it realistically would be a while before I get to it, just due to the fact that I'm mainly trying to focus on systems design. That being said, there are many good resources on the internet for how to calculate this type of thing!
For processing chunks, is it possible be to use Kafka + spark so each spark job handles single video but processes it’s chunks on multiple workers and at the end marks the job completed when all chucks are processed. Making keeping of state of the video’s chunks redundant.
A couple of concerns here that you'd have to address: 1) how do we know when to trigger the spark job? 2) You're triggering a lot of spark jobs haha In practice, this may work! I think we'd have to try it out.
Good point! How about if you could use Kafka queue to queue jobs. Message would just contains the videoID which has chunks ready to process. A consumer could act as a spark streaming(master) node. Picks available message, fetches all the chunks_ids/fileurls for that video and distributes chunks to worker nodes. Once all chunks are processed the master node would know and mark the video as complete. As an advantage it’ll be easy to track which video failed rather than chunks.
Hi Jordan, thank you so much for for keeping us educated and sharing your ideas in system design. Short question: dont we also need to add chunk processor, once a user uploaded a video into temporary S3 or DFS, the service splits it into chunks. And meanwhile one more question: if we have single leader replication + partitions in Cassandra, will it work with comment editing right? And also we need a service to create a user feed :)
also I guess for insert, updated and delete operation on a single row are atomic, isolated and durable in Cassandra and assuming that the same user edits its comments - there should not be a problem with eventual consistency. what do you think man? :)
Thanks! I had envisioned the user's client breaking the file into chunks. Secondly, I'd agree that edits of comments are no issue if we use single leader replication, but for multi leader replication they definitely could be!
Thanks for the video, Jordan, very impressive. want to understand the reason using Flink here, I know Flink is a streaming processing tool. I believe we want to confirm if the transcoding of all the chunks is done. my thought is to use chunk db table to mark each chunk's status.
You can definitely use a chunk db. However, note that this means: 1) You need to make an additional network request to the chunk db every time 2) That request can fail, how do you ensure that we eventually write it there?
I like your idea of data locality in a DB for each of the processed chunks. But I don’t know if I understand if it works. When thinking about a single machine (say a personal laptop) reading from disk, the video ought to be stored as continuously as possible to ensure good data locality and no disk jumping. Makes sense. But when talking about a distributed service, I can’t see how this helps. As I understand a disk, it can be only reading in one location at once. There might be multiple physical hard drives on one machine though. Anyway, so let’s say I am watching a youtube video and I grab Chunk1 from the DB. Great. Chunk2 is next, which i’ll request in 5-10 seconds. But what happens if someone else is watching a video partitioned on the same DB shard. And they request their chunkXYZ. The disk jumps to their spot, then back to mine when i request chunk2. So it seems like making the distributed DB have good data locality can break down quite easily with concurrent requests. Hopefully, however, most videos are read from the CDN which would be much faster since it cache is in memory. But that’s a lot of expensive memory for caching all videos, so maybe that is on disk partially too, which I guess would have the same problem. Any thoughts? I suppose good data locality doesn’t hurt in the case where my sequential reads are not the systems sequential reads. So you might as well try to have good data locality.
Oops. I just rewatched and realized that you had an S3 location in this DDB. The data locality was for range queries to fetch the next X many chunk locations while buffering. This makes a lot of sense.
If you're looking for junior roles, I'd honestly just keep grinding leetcode haha. Otherwise, I'd say that the whole video is still relevant. Can't hurt to learn!
Real treasure! Thank you. What resources did you use to learn these concepts? I know your knowledge is not out of books, but based on years of hard work and experience. how I can start learning these concepts myself? What can I do to be knowledgeable like you in next 5~10 years?
Just reading haha, I'm nothing special! You'd be surprised how much you can learn by looking at "Uber system design" from reputable sources (their site and not RU-vidrs)
hey jordan, I have one question.. how is client able to uniquely generate the chunk-id and video-id because here, you are showing that client will be uploading to s3 and then sending that data to upload-service but who is assigning unique-ids to all these entities flowing in our system ?
@jordanhasnolife5163 Hi Jordan, I have just one question, your processor is transforming the video into a list of transforming videos, it will depend on the number of encodings * resolutions. let's say for example we have 10 encodings and 4 resolutions. it will make it 40. So we have to transform on 1 chunk into 40 and upload into 40 into s3. I assume transforming one chunk to another itself a heavy process. Can you suggest some optimization here? if our event processing fails so we don't have to transform every chunk from the beginning.
I'm pretty confused what you mean here - each resolution/encoding is processed independently in tandem already, so if one fails the rest do not fail, feel free to elaborate!
@@jordanhasnolife5163 what I understand about the flow of data 1. first we are uploading chunks is S3, lets say (c1,c2,c3.....) 2. adding chunk details in broker (rabbitmq) 3. The processor consumes chunk details from the broker let's say C1 and puts a list of transformed (C1R1E1, C1R1E2, C1R1E3, C1R2E1,C1R2E2,C1R2E3)video into the S3 considering (resolutions(R) = 2, encoding(E) = 3 ). and processor also put list details into flink.
@@ankitagarwal4022 The only transformation of one chunk to another that we're doing right at the start is creating the list of all of the metadata that we will eventually need to create. So that can all go into rabbit mq, and once it does we can be fairly confident that the chunk will eventually be created downstream because it will only get removed from rabbit mq once the consumer puts the completion message in kafka
once the chunks are uploaded by the user to S3, how does upload service know which chunks to put on the rabbit MQ? is this done via S3 notifications to the upload service?
qq: is client to be responsible for chunking the video and upload to s3? or should there be a merchanism to upload the video directly to s3 and have some dedicated backend wokers to chunk it in async fashion?
I mean in theory kafka, but I tend to imply that our Kafka cluster has replicas. CDC does make things slower, but I suppose in the cases where I use it I don't actually care (hence why I use it)
I recently cleared my system design round after watching ur videos..it's so compact & precise. Thank you for making such videos. Can you please mention your linkedin id ?
Video Timestamp: 10:18 Part-1: For the user Videos table, We can omit timestamp as UserId+VideoId make a unique pair and when you get the videos from the table, you get timeStamp and then you sort them and display the videos for a user who uploads videos. Correct me If I am wrong. Part-2: Also, in the Video Comments table, VideoId will be unique so why are we using timestamp along with this. Does this help in getting output in sorted manner ? Thanks :) Edit: Added Video Timestamp
1: Definitely doable, however it is easier to keep things pre-sorted by timestamp in the metadata database so that you don't have to sort them on the fly for each read. 2: You answered your own question :). Having a timestamp for comments allows us to easily fetch comments in a pre sorted order, as we can index those comments on timestamp per video.
@@jordanhasnolife5163 Thank you :) I love this series and System Design 2.0. This got me thinking of starting my own series on System Design topics. Maybe one day for sure :)
Chunking stuff question... Why would you want to store chunks except in cache? Let's say video is 50Mb, you want save permanently transcoded 3-4 resolutions x 1-2 formats? Petabyte here petabyte there and we are talking about big numbers... If you always can re-create them - no need to store transcodes for video that was last viewed 3 years ago... cache them with last-access timeout set to 1 week for example... Maybe you want to store first chunk for fast access at maximum
Would appreciate if you could elaborate here! While it's true that we could store the entire video file and never deal with any chunks, assuming we originally upload chunks to S3 when first uploading the file we'll always need at least some chunk metadata in our database to load them
@@jordanhasnolife5163 my guess is like this - we are receiving file of original resolution -> chunking it by 2 sec -> long term storage. Transcode first chunk into 144,240,360,480 etc resolutions (don't store) -> CDN expiration = 1Y last access (just to have fast start experience). Whenever somebody starts to watch video we transcode necessary resolution on the fly from original chunks in parallel and store it in CDN expiration = 1 week. I'm sure sum of transcode speed will be faster than viewing speed so we will make viewing seamless Regarding metadata - as you said during upload we can store all necessary chunking stuff in some nosql db
@@vorandrew Ah I see what you're saying here, I think it's one of those things that we'd have to actually try out and see if the latencies would be low enough. We do care a lot more about lowering read latencies here, so I wonder if this would work in practice but it's an interesting thought!
@@vorandrew Haha yeah - my personal philosophy here is to use as much disk space as needed, we could always optimize for cost saving measures in the future! At least for the interview I don't know how often it would come up, but it's possible!
At ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-43bB7oSn190.html, another queue is needed to stream total chunk count to Flink. This look a bit redundant to me. Why don't we just include total chunk count as an extra field of events that are sent to RabbitMQ?
Totally doable as well, I considered this approach too. I mainly assumed there'd be a lot of other metadata around and didn't wanna bloat the messages.