Design a High-Throughput Logging System | System Design

Подписаться 99 тыс.

Просмотров 42 тыс.

50% 1

Visit Our Website: interviewpen.com/?...
Join Our Discord (24/7 help): / discord
Join Our Newsletter - The Blueprint: theblueprint.dev/subscribe
Like & Subscribe: / @interviewpen
Logging systems are commonly found in large systems with multiple moving parts. For these high-throughput real-time systems, there are a number of challenges and considerations at scale. This video gives a high-level introduction to some of these challenges and how to overcome them.
Table of Contents:
0:00 - Introduction
0:27 - Requirements
1:33 - Naive Solution
2:18 - Sharding
3:07 - Bucketing
4:15 - Sharding and Bucketing Combined
5:05 - Migrating to Cold Storage
7:00 - Next Steps
7:59 - interviewpen.com
Socials:
Twitter: / interviewpen
Twitter (The Blueprint): / theblueprintdev
LinkedIn: / interviewpen
Website: interviewpen.com/?...

Опубликовано:

30 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 22

@lunaxiao9997 3 месяца назад

great video，very clear

@interviewpen 3 месяца назад

Thanks!

@michatobera6049 5 месяцев назад

Great video

@interviewpen 5 месяцев назад

Thanks!

@GoofGoof-cs6ny Месяц назад

So in 2018 every service was writing logs to node 3, didn't we went back to bad write complexity by doing bucketing?

@interviewpen Месяц назад

Yep, bucketing makes query performance better, so we introduce sharding as well to distribute writes within a bucket.

@supragya8055 Месяц назад

i dont understand , if under same bucket lets say for (2021-2022) we have multiple nodes , how are reads any faster ? for the same bucket logs will be distrubuted across servers and still need to be queried across servers which is slow . Bucketing didnt help in improving read performace , is my understanding .

@interviewpen Месяц назад

Yes, sharding improves write performance at the expense of query latency (unless we shard by something more clever!). However, we can still handle a high throughput of reads. This latency vs throughput problem is a common tradeoff with large-scale systems! Hope that helps :)

@developerjas 5 месяцев назад

Great Video man! Would how would you go about designing the data ingestion part?

@interviewpen 5 месяцев назад

Great point! There’s a lot that goes into ingesting logs while optimizing network performance and maintaining context. Check out our full video on monitoring systems on interviewpen.com :)

@sahanahunashikatti3935 5 месяцев назад

😊😊 ok 0@@interviewpen

@didimuschandra6680 5 месяцев назад

Greatt video!! thanks! but, can you create video to develop Effective and efficient Ticketing System?

@interviewpen 5 месяцев назад

Sure, we'll add it to the backlog. Thanks for watching!

@sahanagn4485 2 месяца назад

Great video!!! Please slow down the speed of video as someone new to topic its bit fast to grasp the concept.

@interviewpen 2 месяца назад

Ok, noted!

@wizz0056 5 месяцев назад

Kafka -> Loki -> S3 If you're looking for an existing solution :)

@interviewpen 5 месяцев назад

Yep, S3 does a lot of the things discussed here behind the scenes. Thanks for watching!

@weidada 5 месяцев назад

Suppose every two years, it ingest 2PB and migrate 1PB, how could three sets be enough to cycle after 12 years?

@interviewpen 5 месяцев назад

Great question! At any given time, we have three "hot" nodes--two are migrating data to cold storage and one is ingesting new data. We only showed one cold storage node in the example, but we would need at least 2 to make this work long-term. Hope that helps!

@ankushraj3599 Месяц назад

Why not use Kafka for high through put?

@interviewpen Месяц назад

Kafka is an event streaming platform, so it wouldn't solve any of the log storage problems we're addressing here. But if you have any thoughts on how to incorporate it, feel free to share!