The channel is all about programming! From tutorials to coding up prototypes to system design, I will upload videos on subjects that I know will provide value to people. I cover system design concepts like horizontal/vertical scaling, load balancers, reverse proxies, caching and other distributed computing topics. When it comes to database technologies, I cover databases such as MySQL, Postgres, MongoDB, Cassandra, Redshift, QuestDB, TimeseriesDB, CockroachDB, etc. I also cover messaging platforms like RabbitMQ and Kafka.
🥹 If you found this helpful, follow me online here:
i thought i finnally found a way to store image using file system. but nah. here we are again with the same shyt. why is it so hard to find a single fuking video about this where millions of website using file system.
IT would certainly be helpful to also understand in which scenarios each of these would be used. For example, at-least once maybe in a payment service (idempotency handled appropriately), at-most once (comments for a post) etc..
If the user is located near the boundary of a big cell 9a, the method proposed in the video won't get all those locations which are in proximity of the user but belonging to an adjacent big cell (9b, for example). How would you work around this issue?
Since am working in adtech and would looking to upgrade our approach to modern, fortunately i got a look into your video and it helps me a lot. My question here is how about to use Clikchouse instead of Casandra, will it work well or lead to any issue?
I link this video to anyone who is beginning with Cassandra. You have the most clear and concise explanations available online. Thank you very much for your work!
In this playlist, I only see the real life examples starting right from the second video. How and where can I get more knowledge of the basics that you mentioned? Do I need to do some other course for it or does this set of videos cover that part too?
When the data is still in Memtable and not yet sent to SSTable and server crashes (or say power goes off), we loose data right? What happens in those cases? Any way to recover data?
I’ve also wondered about that. But it seems that the commit log is on disk, similar to WAL. So it would rebuilt from that. The more interesting question here is conflict resolution between the nodes in case of conflicting writes and how is the data replicated between the nodes. Classic distributed system issues it seems.
@@Sverdiyev If commit log were on disk, it would defeat the original statement that "cassandra writes are fast since they are written to in-memory commit logs"
Hey great video. I have a question about microservice approach: If we make the processing asynchronous with a service receiving the requests and others services as processors (the ones that communicat with the gateways). 1 - How would we communicate to the user? Since he's expecting the purchase redirect page to finish? 2 - How would we store the data? Since each microservice should have its own db?
So I have two questions 1. If we are going for the TRIE data structure then there is no need for the table 3:23 right, we just build the TRIE ds and store it in the DB. From there, we can either serve the data either from the DB or cache right ? 2. Every once a week, we can run a Cron job to execute a spark job which will take whatever there is in our logs and update our TRIE data structure and update it in our DB. Is my understanding correct ?
That was an awesome video, i had a similar approach and got it validated. I was wondering if you could also start a code series on building such systems (as demonstrated in video).
Thank you for watching! I plan on building similar sub-systems, but TBH, building an e2e system like this without an actual use case (and traffic) is not really worth it. I don't think it will add much value either. Thank you for the suggestion though. Appreciate it.
@@irtizahafiz it would make sense tho, if someone's just starting,I was thinking we could use some dataset on clickstream logs, create a stream of the logs coming in(simulate a stream through python), and then build the system.
Thanks for clearly explaining the end to end design. Just a couple of questions: 1) Could you explain a little bit about how the Apache log files gets the clicks information and how is it realtime. 2) Also, Do you have any link of these notes/Diagram. As the one in description doesn't work.
The simplest would be to write a cron job or something similar that executes every couple of minutes reads the log file, and writes new data to Kafka. You can also poll using a continuously running Python program. What that would look like is a Python program will be running on, say, a "while" loop and read from the file every couple of minutes to write to Kafka. These are 2 solutions you can quickly prototype. For more comprehensive solutions, there are dedicated file watcher daemons that you could use.