I would treat a 2 person chat as a group, that way you can easily add users to that same chat without converting it to a group. Messenger has a character of 20,000 which is 156.25Kb and the average message will be far less than that, maybe around the 400 char mark. That is 3.125Kb, so a total average of 585.93GB per day.
This is a fantastic overview. There are many more questions to answer now, but this gives a great summary of what to expect and how to think about the service and all its parts.
I always we have to draw all our projects before anything else, planning it's the most important aspect in development, product specifications are a great video my friend first time to saw it, and I will see all other videos I love system design to become a a better software developer.
It is right to use socket for sending messages purpose rather than http connection which is stateless but it may also break the socket connection due to payload and even if we use stream or batches to send heavy payloads then it will be more preferrable to use rest api for sending message along with socket inside it to notify users in real time.
Just few thoughts, wanted to understand few points: 1. We can have cache to handle the connection information of all websocket and user connections 2. Why can web socket manager not be used for containing data on web socket to user connections for 1:1 communication ? 3. For unsent messages can we have a queue for the service to stream it async.
Fantastic video. Finally I understand this topic. One question, Why use session service for messaging service and web-socket manager for group messaging. Session info only has info about service or system which is connected to user and web socket can only tell which web-socket user is connected to but don't we need both for group messaging and single user messaging? In any case should we not need same info for both Group and single user messaging? It will be great if you could help out.
Thank you! If I understand your question correctly. For single-user messaging, the Session Service is typically sufficient because you're dealing with a direct exchange between two users. The Session Service keeps track of who is logged in and facilitates the communication between them. For group messaging, while WebSocket managers are indeed used for the real-time communication aspect, you may still need the Session Service to handle authentication, authorization, and potentially other user-specific details. However, once the users are authenticated and authorized, the WebSocket manager can take over to manage the real-time exchange of messages among multiple users efficiently. In short, the Session Service handles user authentication and individual user sessions, while the WebSocket manager handles real-time message exchange, especially for group messaging where broadcasting messages to multiple users is essential.
@@ByteMonk yes well explained. Session Service - Auth service -> Manager authentication in LB env Websocket - To manage websocket connection, transfer of data. But I think that can be managed by messaging service itself?
Please make vedios on components also which are used in system design like websocket, what is no sql db, sql db, s3, cassandra, hadoop, apache spark, replica, cluster, cdn, redis cache etc etc...only basic info we need because these are common in every system design.
How can one justify storing that huge volume of data in a relational DB? This can lead to many questions.. I would suggest storing user metadata in a RDMS but messages in a document store like MongoDB which can scale better horizontally
Thanks for making this video. The glaring issue I see with this design is in the back of the envelope estimation. Any good interviewer is going to stop you as soon as you say 75 PB / day. Unless you’re building the next Google cloud, this amount of storage is completely unreasonable, especially for a chat app. Estimations are not there to just impress the interviewer, they exist to influence the design. The right move would be to say, this amount of storage is likely unreasonable for a system such as this and thus, we will not be storing user messages in the cloud.
Thanks for the comment, what you mentioned is definitely an approach to consider. Most of the MAANG interviewers we know don't even care about those numbers, but it's always considered a good thing to note that candidate is thinking along those lines. There are however occasions when interviewer tend to dive deep with tech requirements, but that should be coming from the interviewer (not the candidate). So if a candidate says 75 PB/day, the interviewer can stop him that he is not looking for WhatsApp/Messenger scale, on the other hand, if candidate suggests /assumes massive storage is not needed, the interviewer can still stop him and clarify that he is looking for high scale. Either way, it's not necessarily a bad thing to have that conversation. It's however better to refrain from spending too much time here.
what is the difference between the session service and the web socket manager? Why does the group service use web socket manager to figure out where to direct messages to reach the right users, while the messaging service gets this info from the session service?
Thank you so much for your content. Just a question, if a mobile device is not in use for long time or your messaging application has been killed by the Android. Then the WebSocket will be failed. How can we prevent this for messaging application?
Hey! How are you creating excalidraw files into such a beautiful presentation? Have you found your own approach? Could you share any resource to create visually similar videos to yours?
Thank you, it is time consuming work to maintain quality of what I want to say and sync them up with visuals. I combine photoshop , FCP and Adobe products
In order to implement in django, I need to treat services as an app in django?Or should I create another projects for the service like messaging services or Group services?
Thank You… You have a load balancer between user machine and web socket handler machine… How does load balancer handles bi-directional connections like web sockets ?
Great question, as you probably know by now, that the difference with a regular HTTP connection is that the WebSocket connection is meant to stay open. If using websockets, your sever will reply the clients with 101 Switching Protocols, telling the client to upgrade to a WebSocket connection. So an HTTP connection becomes WS, and HTTPS becomes WSS. Modern day load balancers can automatically upgrade an HTTP connection to a WebSocket connection and once that happens, messages will travel back and forth through a WebSocket tunnel. Here is an example of AWS ALB here aws.amazon.com/blogs/compute/using-websockets-and-load-balancers-part-two/
Thank you, To your question, it depends how you are using "messages". Queues are typically used for loose coupling or fan out, but you can use them here too! especially if you can show if it can be operationally cost effective. Thanks for the suggestion.
Wow the number are definitely off for capacity planning!!! 1) 500M*3 = 1.5B .. 500M*30 is 15B!! 2) 1.5 B *50KB is 75 TB. 3) The average size of a photo is 1-2 MB and videos are definitely bigger, so 50KB is an under estimation for a MSG!!!!
Thank you for noticing and commenting here, its definitely way off! that will however impact the autoscaling, I won't change the system architecture much except treating a 2 person chat as a group as pointed by another user whose comment I have pinned. This video has been posted a while back, will ensure we crosscheck the calculations before posting. Thanks again!🙏
What technology would you recommend for inter-service communication? For example the message service communicating with the session service using RPC or a private API or something else
A message bus. If you use other technologies like HTTP, GRPC .etc you have to pass requests in a chained fashion and that can cause huge problems. A message bus has natural load balancing and resilience to services being down.
I am confused between the session service and the web socket manager. I thought that the session service is used to keep track of the user/server (socket) handler mappings. So, why do we need the socket manager ?
Session service is keeping track of the Servers users are connected to, a Server can have multiple ports for Web Sockets and Web Socket handlers (light weight), which needs to be managed by a Web Socket Manager. For further clarity, please checkout my video on Web Sockets from the system design playlist in the description above. Thanks.
Yes, these are not hard/actual numbers, but given that a message can be a jpg/video file occasionally, 50kb is assumed as the average. Your interviewer may stop and ask you, and you can adjust your capacity accordingly. No negative points as long you are not way out of range, in fact this can be a healthy discussion with your interviewer if they insist.
I have one query, if a user is offline the message goes to relay service and stored in a queue and will be sent when a user come online but let's say the relay servce is busy and user is online and he receives a new message from a different user. Meanwhile the relay service process the message from the queue and sends to the user. If this happens then the order of the message is not preserved how do we prevent this.
looks like we only talked about dataschema for remote side not local? how should we store/retrieve messages locally? and how we make sure the consistence between local data and remote?
Great question! To store and retrieve messages locally, you can use a local database on the client-side. The most common options for this include SQLite, Realm, or CoreData. When it comes to ensuring consistency between local data and remote data, one approach is to implement a synchronization mechanism that keeps the data in sync between the client and server. Here are a few ways you could approach this: Real-time synchronization: Use websockets or push notifications to notify the client of new messages. This way, as soon as a new message arrives on the server, it can be immediately pushed to the client. This approach ensures that the local data is always up-to-date. Periodic synchronization: Periodically check for new messages on the server and update the local database accordingly. This approach requires the client to periodically poll the server to check for new messages. This can be less efficient than real-time synchronization, but it can be useful if you don't need real-time updates. Conflict resolution: In case of conflicts between local and remote data, you need to define a conflict resolution strategy. For example, if a message was deleted on the server but not on the client, you need to decide which version of the message to keep. One approach could be to always favor the server-side data, or to prompt the user to choose which version to keep. This could be a follow up question in the interview setting, I won't expect the interviewer to do a detailed design, here would be my response.
@@ByteMonk thanks for your reply, one more question: we only store message in client is that correct? because i didn't see we have table to store message from remote side as shown in video.
There are 24 hours in a day, 60 minutes in an hour, and 60 seconds in a minute. 3600 is the number of seconds in an hour. We wan't to know how many messages need to be processed in seconds and so I am dividing it by 24 * 3600 = 86400 (Total number of seconds in a day)
Considering its an interview environment, we could have approximated the total number of seconds in a day to be 100,000. That way we would avoid unduly taxing ourselves with precise calculations in an interview setting.
WebSocket Manager provides a way for the server to push data to the client in real-time and enables bidirectional communication between the two endpoints and is typically used in web applications that require real-time updates or messaging. On the other hand, ZooKeeper is a distributed coordination service that provides a centralized repository for storing configuration information and synchronization data across a distributed system. I will prefer Websocket manager terminology.
I left it out intentionally. Implementing end-to-end encryption is a complex task, and it's crucial to thoroughly test and review your implementation. Implementing end-to-end encryption in a chat application similar to WhatsApp involves using a cryptographic protocol such as the Signal Protocol, which is known for its strong security and privacy features To implement the Signal Protocol or a similar algorithm, you'll need to consider 1. Implementing the cryptographic primitives: You'll need to include libraries or implement cryptographic functions for key generation, key exchange, symmetric encryption, and decryption. 2. Manage user keys: Develop a system to generate and securely store the encryption keys for each user. These keys should be protected using strong encryption and access controls. 3. Implement key exchange: 4. Session management 5. Encrypt and decrypt messages Regarding API endpoints, while the Signal Protocol itself doesn't necessarily require additional API endpoints, you may need to develop APIs for managing user registration, key exchange, and message transmission. These APIs would handle tasks such as user authentication, key storage/retrieval, and message delivery. The specific endpoints required will depend on your application's architecture and requirements.
@@ByteMonk that’s a brilliant explanation. I implemented the signal algorithm with using no libraries for my high school computer science project but didn’t have time to fully implement it (QR codes to compare identity keys, group messaging). Not proud of it though because I didn’t even know about transactions back then so it can be broken easily, it’s not distributed, it was implemented using pure TCP sockets with an API built on top of that and it was designed to be monolithic.
@@m_t_t_ While your high school project may not have been perfect, it's commendable that you tackled such a challenging task. Take this experience as an opportunity to learn and improve your skills in building secure and scalable systems. Recognizing the limitations of your initial implementation is an important step towards improvement. Here are a few suggestions on how you could enhance your implementation: 1. Use secure communication channels: Instead of relying solely on TCP sockets, consider implementing your solution using secure communication protocols like TLS or HTTPS. 2. Implement secure key exchange: In addition to QR codes, consider incorporating a secure key exchange mechanism such as the Diffie-Hellman key exchange protocol. 3. Group messaging: Extend your implementation to support secure group messaging. This involves managing group memberships, handling encryption and decryption for multiple participants, and ensuring secure message distribution within the group. 4. Consider a distributed architecture: To enhance scalability and reliability, consider moving away from a monolithic architecture and explore a distributed system design. This could involve using technologies like distributed databases, load balancing, and fault-tolerant infrastructure. Check out my full system design playlist in description and share with your friends if you found this helpful :)
Can someone help me understand the start phase - having websocket handlers for each user and the use of session service, Does all users are on separate servers, what is the architecture for websocket connection? Please point to any other available resources if possible, thanks
I don't think he meant every user is in a separate server. There are N users and M servers. Every server can handle one or more websocket connections, one per user.
The biggest issiue we have is when the app is in kill state so the messages wont deliver then we say lets use the fcm for that may i ask is there any way that we can create our own system like the fcm as lets say we dont trust google for data handling and want to create our own service i will be looking forward for your reply
It is possible to create your own messaging system similar to FCM (Firebase Cloud Messaging) for handling message delivery in a scenario where the app is in a "kill state." However, building your own messaging system can be complex and resource-intensive, so it's important to carefully consider the trade-offs involved. You would need to set up a reliable and scalable infrastructure to handle the messaging service. This would involve managing servers, databases, network infrastructure, and other components necessary for message storage and delivery. You would also need to Implement a message queuing system to handle the delivery of messages. When a message is sent while the app is in a "kill state," it can be stored in a message queue for later retrieval and delivery to the intended recipient.
@@ByteMonk thanks for the reply I am very great full we did try that but the biggest problem we faced that we were unable to awake the app in android and iOS
@@Glashutte1900 Awakening the app in Android and iOS when it is in a "kill state" can be challenging due to the operating system's restrictions and limitations. Both platforms have specific guidelines and restrictions on how apps can be woken up or run in the background. In Android, you can use background services to perform certain tasks even when the app is not actively running. However, starting from Android 8.0 (API level 26) and above, there are limitations on background execution to improve battery life and performance. Background services may be subject to restrictions and may be terminated by the system if deemed necessary. iOS provides specific background modes that allow apps to perform certain tasks in the background. However, Apple imposes restrictions on the use of background modes to preserve battery life and protect user privacy. You need to carefully review and comply with Apple's guidelines for background execution.
@@ByteMonk you are absolutely right 👍 the only solution it seems to build something similar to FCM but again this is also a Google product and we don't know what facilities they already have which we can not access