Distributed Metrics/Logging Design Deep Dive with Google SWE! | Systems Design Interview Question 14

Подписаться 42 тыс.

Просмотров 14 тыс.

50% 1

True sigma males don't care about your logs
00:00 Introduction
00:55 Functional Requirements
01:42 Capacity Estimates
02:41 Database Design
03:36 Architectural Overview

Наука

Опубликовано:

24 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 48

@leothelab6636 Год назад

Dude this channel is so underrated

@firoufirou3161 Год назад

Thanks for the videos. I think one important functional requirement that most logging solutions offer ( GCP logging ...) is text search. So potentially having a text search engine (ELS) is something to consider.

@jordanhasnolife5163 Год назад

Makes sense! I think that's a reasonable thing

@rahulaga 2 года назад

best part is that you really focus on 'why' aspects of things, keep rocking !!

@jordanhasnolife5163 2 года назад

I appreciate it man! That's why I originally made this channel!

@Baronvonbadguy3 2 года назад

Same reason i love these 👌

@calvio2835 Год назад

Once we have the data in the time series DB, how do you suppose we go about hooking up a monitoring/alerting service to it? I'm not sure what the optimal route is between 1. push based model where for every new metric (or batch) in the time series DB, we query an alarms/rules DB, or 2. pull based model where the alarming service periodically queries the time series DB for all alarms/rules in the DB. 1 seems excessive since majority of real time metrics aren't going to fire an alarm. 2 seems excessive in that most alarms aren't firing at a given instance.

@jordanhasnolife5163 Год назад

I feel like 1 is probably more practical, especially since you can just build an additional change data capture queue off of the time series db to read from an alarm, but truthfully am not entirely sure.

@theghostwhowalk 2 года назад

One more great video!! Loving it!

@jordanhasnolife5163 2 года назад

Thanks!

@rajrsa Год назад

Thanks for this! Is flink consumer just like a normal java/spring queue consumer that is monitoring a AWS kinesis stream? (I've never used flink/kafka.) Do we have to use flink in conjunction with kafka queues or would any service work?

@jordanhasnolife5163 Год назад

I have a dedicated video about this, but pretty sure flink is flexible with multiple types of message queue. Ideally though you should be using a replayable one though.

@saisreenath4199 2 месяца назад

High quality content. Keep doing these videos 👍

@advaitchabukswar4163 5 месяцев назад

This video really helped me in one of my interviews, thanks a lot!!

@jordanhasnolife5163 5 месяцев назад

Hell yeah!

@art4eigen93 Год назад

That intro. .. 🤣🤣🤣

@prashantbharadwaj6341 2 года назад

Good one Jordan. You are very clear in your thoughts. Keep this going !! :) Metrics/Logging system is challenging because of both high scale writes/reads. It looks like the write scale here depends on how much Kafka can scale. If we are looking at a very active public service receiving 100 billion msgs/day (1000 msgs/sec), I am guessing Kafka can handle that ? What about read load ? Since lot of people may use the log for customer investigations, there could be a lot of read load on the time series DB since the other path is for batch insights. As I am typing this, I am thinking about splunk. Could you make a video on how to design splunk like system ? (May be these are the building blocks)

@jordanhasnolife5163 2 года назад

Spunk is just a distributed search index as far as I know, which I already have a video on. It's called twitter search/elastic search, which you could just connect to kafka.

@dind7926 9 месяцев назад

hey Jordan, what prevents us from sending the unstructured data directly from the client to the S3? If we do not care about data enrichment we might as well just send it straight from the client, unless I'm missing something? also a couple of follow up questions just to clarify it for myself: - why do we need a logging service, why can't we just push the data from the client straight to the queue? - as far as I understand we leverage Timeseries DB for queries on relevant "recent" data, so I assume we would need some sort of clean up jobs that run periodically? And we use data warehouse (like Snowflake) to enable analytical queries that would be too big to run on our main DB?

@jordanhasnolife5163 8 месяцев назад

1) I agree, though in this case I was assuming that we are doing some sort of data enrichment via the flink consumers. 2) Similar to your first question - but at the end of the day there are reasons we want to have some sort of gateway before we process every request to send data to kafka. Examples might be rate limiting or some sort of validation on the messages to ensure that a bad actor can't spam our kafka queues. 3) Yep! Timeseries DBs making dropping old data very simple, that's one of the main benefits of them.

@tavneet19941 Месяц назад

Hey, how about using Apache Pinot or druid to support better querying capabilities directly on the real time data?

@jordanhasnolife5163 Месяц назад

Hey, I'd have to look into these more, but if it allows you to query directly from kafka itself in addition to the db then I suppose that could speed things up a bit. Depends whether we need it!

@VyasaVaniGranth 17 дней назад

My suggestion to keep this channel going would be to get ripped and document your fitness journey. Just a thought :D

@jordanhasnolife5163 17 дней назад

Haha it's been almost 2 years since I uploaded this video, some stuff has changed

@VyasaVaniGranth 17 дней назад

@@jordanhasnolife5163 Suggestion still holds ;)

@user-db5mx5ou4s Год назад

can we use HDFS instead of S3? that way we'll achieve data locality and will be part of hadoop cluster? - will be cheaper as well?

@jordanhasnolife5163 Год назад

From my understanding it's not necessarily cheaper if you're just using it for data storage - if you want to do processing of the data that's different - not sure we need to do any in the Hadoop cluster here though

@raysdev Год назад

awesome content! sorry for skewing your metrics towards the other 97% but that was funny as ...! 😅

@jordanhasnolife5163 Год назад

Haha no worries I'm grateful for all my viewers regardless of gender

@helperclass8710 3 месяца назад

Thanks for the amazing video. In one of the interview I was asked to design flight recorder to record the data within a flight. Could you please make a video on that.

@jordanhasnolife5163 3 месяца назад

Interesting - so fwiw flight recordings are just taking various samples of the program at slightly randomized intervals, so I think that the design would be very similar to this one, but you can just partition events based on the thread they were published from in your time series database

@ShreyaGupta-nc7td 3 месяца назад

Thank you for the amazing content! Can we use Cassandra instead of S3? What would be the trade offs?

@jordanhasnolife5163 3 месяца назад

see previous comment to duplicate question

@Piyush-ky9ee 2 года назад

How can your single leader replication in TimeSeries DB handle the enormous amount of writes ? Won't it be overwhelming for that single leader ?

@jordanhasnolife5163 2 года назад

Sorry if I didn't specify, you can shard based on the source of the metric

@Piyush-ky9ee 2 года назад

@@jordanhasnolife5163 But if you do that won't it create Hotspots/Noisy neighbor problem ?

@jordanhasnolife5163 2 года назад

@@Piyush-ky9ee It would but since we're just ingesting the data it's really not a big deal in my opinion - if you split the TSDB by the source of the metric and just say send all of the metrics to that one shard of the TSDB I don't think that it should be overwhelmed, as the writes are being buffered by a queue and the entire index can be cached for fast ingestions - perhaps I'm wrong

@ShreyaGupta-nc7td 3 месяца назад

Thank you for the amazing content! Instead of S3, can we use Cassandra? what would be the trade offs?

@jordanhasnolife5163 3 месяца назад

I suppose we could - though unless we want structured data not sure why we would (I'm assuming these are just logs for now). Presumably it would be more expensive to run a managed cassandra db than just dumping in s3.

@TheImplemented Год назад

Thank you, how did you manage to grasp systems design in such a short time? What is your approach of studying?

@jordanhasnolife5163 Год назад

Hi! I still have much to learn, but my approach was to read as much as I could, starting from Designing Data Intensive Applications. From that point on, whenever I'd see a piece of technology I hadn't heard of I'd make note of it and look it up later. I also take notes on all of this so I retain the information better.

@sumeet2707 Год назад

Hi Jordan - Thanks for this video. Do you mind sharing which Pinterest video you referred to in this design?

@jordanhasnolife5163 Год назад

ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-DphnpWVYeG8.html&ab_channel=DataCouncil

@sumeet2707 Год назад

@@jordanhasnolife5163 ❣

@charanbirrekhi9077 2 года назад

Why not kafka instead of spark?

@jordanhasnolife5163 2 года назад

Kafka and spark are used for different things - Kafka at its core is a message queue, whereas Spark is a big data engine for batch processing. Although there may be kafka consumer software now, when I hear Kafka I think of a message queue.