I do not recommend Exponent. I signed up and paid for Exponent's system design course - it is not worth it. It's very generic and the material very shallow. This video is proof :)
First make sure to add the feature to remind users to subscribe to premium version, endlessly without opt out. Just kidding. Great details, very useful. Thanks.
This design based on thought that all videos are single blob, but actual RU-vid preloads videos partially and each of their resolutions can vary independently.
Few things talked are not clear: - Blob is shared. ... thats little surprising.. - adaptive streaming and non-adaptive streaming.. bandwidth estimation and request for corresponding chunk is done by client not server.. - does shard by video makes good utilization of resources instead of user id?
For streaming of videos (uploaded) which we know would become popular. We default cache them in the CDN. We know that these are popular videos so we cache them and access the videos via CDN's from various regions based on geography, type of event. We would have a list of popular events like SuperBowl, TaylorSwift, Soccer matches which are popular based on region (country) and those we cache directly once the video is uploaded. Live events streaming is out of scope for this design. Do let me know if this is a good approach and what would be some changes if we went for live streaming events.
I like your approach of region and event-based caching. It should give improved performance! 💪 For live streaming, we could potentially be looking at implementing real-time transcoding, using low-latency protocols, adding origin shields, developing dynamic scaling capabilities and some security measures.
Why? It is a valid criteria to compare. They are different simplex vs duplex, so a valid comparison to choose one vs another, right? What am I missing ?
He talks about sharding the video metadata DB and then says "Corresponding the same thing could happen to the blob storage as well". So let me get this straight, we use S3 for blob storage and then we shard it? What a load of BS!
@@kumarmanish9046 yes it depends on what the server can support but technically you can upload at least 1GB or you can break the video in small files and then upload. You can leverage blob store like S3 here
@@rajsekhar28 Here web-sockets are not just used to upload vedio , he must be trying to say that he would be keeping connection alive while uploading videos i guess..
What happens on a real product like this if you cannot afford to scale more blob storage in a mid of few years of the product, would you tell the users from now on you have limited numbers of videos or other limitations or this more business decision?
Hey DamjanDimitrioski, thanks for the question! While there could be technical solutions to slow down the need to scale, this is ultimately a business decision. If you can't afford to scale, it may likely mean that your business model isn't profitable enough, or you aren't monetising it well. The question then becomes "is it worth it to keep this product up and running?" (considering accounting profits, opportunity cost etc.) Hope this helps!
Got lost towards the end of the discussion when the Q was how to funnel the "Superbowl" live broadcast via this solution. Didn't understand the caching part of it. How reliable that will be?🥵
The SQL you mentioned is relational database right? It's hard to scale rds, but not possible, beside, solutions like aws aurora can take care of the scale for you. Here is an exmaple that I think rds is better than nosql in this case. When you want to know people who replies your comments(join operations), you will find out that implementing rds db is very simple and elegant, all you need to have are userDB, CommentDB, and a join can take care of the request. But if you choose nosql, like mongodb or dynamo, you have to store user info(e.g. name) into CommentDB because nosql doesn't support join. And you will run into problem when user update there profile. The name saved in the CommentDB also needs to be updated to prevent the inconsistency. Clearly it's not a good approche . Of course you can only save the user_id and do another query to mock the join operation, but why not using a solution with join at the first place?
@@yuanhengzhao4188 yes thats one of pro but i think it would difficult when you shard data which require joins have to be on same shard so I'm not sure if we can scale that way so in that case nosql is easier
That's the reason sharding logic was discussed, needed to keep in mind how the recommendation systen will use joins between rows etc. Though the answer is not satisfactory to me, bcos it can create a lot of load on shards which has users with millions of subs, more diff strategies to explore
Same thought here. User-based sharding will help in indexing only the particular creator's videos. But geographic based sharding will help in indexing, and it will be more efficient in building recommender Systems
one suggestion, please do system design with experienced Architect, not engineering managers who hardly have 10 years experience and wouldn't master technology as such, people management is easy but grinding for technology is hard. please do this much favour for us and avoid such shallow contents
Hey bro how youtube uploads work if the video is large like 500mb or 1gb. Is the upload happens directly on frontend on s3 then link is given to backend or http call is made to backend and wait for its upload
Video compression is done and every video is split into three major tasks: Video Transcoding, Audio transcoding and metadata persistence. Talking about video transcoding, Videos are split into multiple chunks of files and a DAG of tasks is generated, workers pick parallel nodes and sequential nodes.