Тёмный
No video :(

Facebook System Design Interview: Design an Analytics Platform (Metrics & Logging) 

Exponent
Подписаться 356 тыс.
Просмотров 72 тыс.
50% 1

Опубликовано:

 

4 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 77   
@tryexponent
@tryexponent 2 года назад
Don't leave your system design interview to chance. Sign up for Exponent's system design interview course today: bit.ly/3K0lTtS
@rituraj889
@rituraj889 Год назад
I have a suggestion for all videos under this platform First they are super helpful when it comes how to carry out the whole design..like how to estimate or begin with But all of them lack cross questions from interviewee side..I mean in real life we can be bombarded with minute detail level questions Like in this video...how is data enrichment working or how come we are making data collection to be configurable without having whole business use cases Bottomline : make it more tougher :)
@cricket4671
@cricket4671 10 месяцев назад
Candidate doing what they showed in video would get downleveled in best case scenario 😅
@mrsiddhu2012
@mrsiddhu2012 2 года назад
Great video - Kudos to the interviewer for making the environment so comfortable. Few things: 1. I could not understand the choice of data base. Ideally this should be a combination of TimeSeries DB + Data warehouse? 2. Certain key components/Aspects like rule engine ( for acting on events), Notification systems ( for notifying the interested subscribers) were missing? 3. 90 days retention is a very less SLA. Down-sampling the data for lowering the volume and storing it for long term could have been discussed. 4. I thought that the interviewer wanted to go beyond just visualization - To automated actions ( alarms etc.) and analytics too.
@pavananantharama3762
@pavananantharama3762 2 года назад
Good Observation. Felt like this interview is going in the wrong direction the moment a NoSQL DB was chosen. I would say time series DB or an Elastic search system would have been a good choice. The key takeaway for me was how well Hozefa communicated his thoughts and solutions. Very good communicatorr.
@schan263
@schan263 2 года назад
This interview is simply too short. At least need another 10 minutes for the design discussion.
@yuanzhang3393
@yuanzhang3393 2 года назад
for the system needs to be in near real time, this is a clarify question to ask the interviewer instead of assuming.
@sivaranjansahu7427
@sivaranjansahu7427 Год назад
Great video. Very natural and realistic, not like the rehearsed and phony ones like many other videos on YT.
@heterodyned
@heterodyned 2 года назад
Metrics and logging sound like two “separate” design questions 🤷🏻‍♂️
@dicksonchibuzor7625
@dicksonchibuzor7625 2 года назад
Heard the interviewer made mention as part of the requirements that system could scale up to a billion users meaning events could be at least double of that (depending on the metrics that want to be tracked). I think in that case maybe a nosql wouldn't be the best persistent data storage decision. Maybe an OLAP kind of database (like clickhouse) should be used. This will definitely have a drastic positive impact in the query time for both visualization (retrieving) and inserting of events and will also help in creating way faster aggregates. Also another improvement that can be made with the design is maybe the queue can come after the validation/scrubbing service and not before. It could help save some space in the queues and not have them overwhelmed because only validated data will get into the queue and invalidated ones are discarded . Only when I see the queue should come before is if we are validating a very large batch of payload at a time then maybe we can stick with this design because validation might take some time for extremely large batches.
@implemented2
@implemented2 2 года назад
It will be beneficial to use a queue in case clients are the trusted ones and validation is not required or minimal. This applies to internal services, when you are building an infrastructure solution for internal usage within the company.
@opencompare
@opencompare 2 года назад
i would probably write all non-real time events straight to a data lake with high throughput. And later ingesting those using distributed data processing platforms like spark or Hadoop. Only gold or silver state data should be stored in DB for analytical purposes.
@jasonwakeman
@jasonwakeman Год назад
@@opencompare This exactly. a distributed schemaless db would require hundreds of instances just to handle the writes. to query this much data would take a lot of cpu and time. so, query the whole thing once per hour/day/etc and aggregate it into many tables of a relational db where the aggregates can be queried quickly/cached.
@bentchow
@bentchow 2 года назад
Introducing a Data Catalog would really help with managing PII and auditing where and how sensitive data is being used through data lineage.
@seenu007
@seenu007 2 года назад
Not satisfied with the discussion. Scope of the question is not clear. Are we building a system for analytics computed out of logging data or are we building a system that has logging and analytics as separate components? Interviewee could have discussed about: 1. Grain of data that is sent by logging system. Is it individual events or aggregated counts? 2. Database design that is optimized for analyzing time-series data 3. Could have expanded machine-generated events and user-generated events and have different treatments on those datasets down the line.
@RM-bg5cd
@RM-bg5cd Год назад
Event Sourcing and projections for visualization would have been amazing here
@schan263
@schan263 2 года назад
This interview is too short so there is not enough time to talk about some of the details. The design interview should be at least 40 minutes. The candidate only had 21 minutes. There's not enough time to do deep dive and it seemed rushed. There is not time to talk about scaling the individual components. Sampling is not scaling. The interview should be longer so that the candidate can talk about how many servers are needed, how much disk space required for X number of years/month, how many requests can be served per second, etc. The requirements list is too short. I feel we didn't spend enough time on the requirements. How would you determine the level of the candidate based on this interview performance?
@saip009
@saip009 Год назад
Question to people proficient in designing backend systems - is this a good example of an interview or the design? I personally found this to be crossing out checkboxes in an interview. There isn't enough trade off discussions or building towards a solution. This seems like an inconsistent brain dump of a known solution.
@jasonwakeman
@jasonwakeman Год назад
I agree. Interviewer did a great job (real interviews wouldn't ask this broad of a question tbh), but i doubt this candidate would make it to next level. If the question was specifically about real-time user events, the answer might pass. but this is not a valid solution for big data. actual solution requires many services and multiple databases to aggregate the data for various use cases. Not one giant db which handles all writes and all queries. storage is cheap, so a solution with a single db doesn't really make sense for big data/analytics
@konradte
@konradte Год назад
from my experience this is not a real world interview. This was only drawing circles and rectangles without talking about data model, time series database, database schema, failure detection, monitoring. The candidate would be bombarded with questions right away. This is a "nice" design to draw but that doesn't take you through onsite with any serious company.
@riit1564
@riit1564 Год назад
Feedback: 1. should have talked more details about data storage and how the storage would support faster queries. Some sample queries as example must be shown and these queries are served. 2. No mention of how logs are stored and indexed for faster search. 3. Didn't justify the usage of queue?
@tryexponent
@tryexponent Год назад
Hey Rahul, thanks for watching and leaving your feedback! Appreciate it!
@jantrollan3358
@jantrollan3358 9 часов назад
How can you stay focused during the interview when the interviewer is so attractive?
@BuyCarsTVPakistan
@BuyCarsTVPakistan Год назад
Ok good interview with Imran Hashmi :p
@rohitparthasarathy6671
@rohitparthasarathy6671 8 месяцев назад
I think for time series data we should be using RDBMS with Sharding or even better have the graph being generated from In memory DB.
@sagarchoudhury56
@sagarchoudhury56 2 года назад
I think this interview will not fly. lots of flaws
@user-sy8ny3vx6m
@user-sy8ny3vx6m 6 месяцев назад
Its surprising that there was no discussion on OLAP storage solutions,since we will be analysing these metrics as end product
@profkg6613
@profkg6613 Год назад
This could be a case for Kafka for message processing queue with event driven API in mind..
@eyesgotshowyo7800
@eyesgotshowyo7800 4 месяца назад
chup
@amazingabhay
@amazingabhay 7 месяцев назад
whats the process of archiving looks like ? how/who gonna move data from main db to archive db and what would happen to precomputed visualisation data ?
@tapanparida3176
@tapanparida3176 2 года назад
very good... both interviewer and interviewee did excellent job... lot to learn from this video... i have an interview tomorrow with amazon, hope this helps....
@designpathy
@designpathy 2 года назад
how did it go ?, i have mine in second week of jan..
@eaf207
@eaf207 2 года назад
@@designpathy Good luck y'all. Can I chat with you I have one coming up soon.
@sachinmalik9574
@sachinmalik9574 2 года назад
@@designpathy how did your went
@gufengmsa
@gufengmsa 10 месяцев назад
The design is a little superficial . In the context of monitoring systems, the crucial 'dive deep' question pertains to data aggregation and the trade-offs between storage capacity and performance. The real world monitor system like cloudwatch and prometheus (push vs pull) have be mentioned during interview as well.
@OneMillionDollars-tu9ur
@OneMillionDollars-tu9ur 3 месяца назад
I am a system design interviewer and a hiring manager and I will probably give him a NO.
@OneMillionDollars-tu9ur
@OneMillionDollars-tu9ur 3 месяца назад
Too many application level assuming, too few hard core technical details.
@OneMillionDollars-tu9ur
@OneMillionDollars-tu9ur 3 месяца назад
And why does he talk about cache at all? No cache needed in this system
@psychoprincess8920
@psychoprincess8920 4 месяца назад
Small correction at 6:00: For money/banking system, consistency should be more prioritized over availability.
@downshiftturbo8974
@downshiftturbo8974 3 месяца назад
Low latency as a NFR didn't make sense to me. Nothing on priority or transactional data like money is involved. This is something passive and it will be used later to make business decisions
@arunsatyarth9097
@arunsatyarth9097 2 года назад
Not in depth at all
@amitkumarsrivastava9261
@amitkumarsrivastava9261 Год назад
NoSQL DB for a time series data. What a Joke!!! Can't believe FB EM giving this sort of design
@davezhang8314
@davezhang8314 2 года назад
Load balancer is redundant if you're using a queue. Events should be published to the queue right away and available consumers (validation service) will handle events as they become available.
@gsb22
@gsb22 2 года назад
I believe you dont expose the queue directly and it has to sit behind a service which actually pushes the data onto the queue. And since this service, needs to scale up and down, we should need LBs in front of front end servers.
@KevindraSingh
@KevindraSingh 2 года назад
Yup exposing an implementation detail like queue directly to the client will hurt the system in the long term when there comes a requirement to modify the design.
@dicksonchibuzor7625
@dicksonchibuzor7625 2 года назад
You shouldn't expose the queue directly to the event payload also it will help with "Load balancing"😃 especially for the scale the interviewer mentioned (definitely should have multiple queues ) .
@japanboy31415
@japanboy31415 2 года назад
wrong.
@jcaliz
@jcaliz 2 года назад
Queues usually benefits of having fast protocols like TCP and UDP (in case you don't care about data loss), exposing these protocols to the end user is not safety.
@CommanderShepard05
@CommanderShepard05 2 года назад
dear team, please provide the name of the tool that the user is using for drawing the architecture
@leandrovieira2981
@leandrovieira2981 2 года назад
I think it is Whimsical
@CommanderShepard05
@CommanderShepard05 2 года назад
@@leandrovieira2981 Thanks Bro !!
@neerajkhanna3024
@neerajkhanna3024 2 года назад
Why was visualization a big piece of the discussion. Design was metrics and logging, which lacked depth. It's whole blob of logging data coming, could be stored in timeseries DB or even object store like S3 then moved to DW like Redshift. Why NOSql DB needed in this case.
@jasonwakeman
@jasonwakeman Год назад
not sure where you get the idea that it is a whole blob? imagine a webapp: you would want to be logging individual events so that if browser is closed you don't lose any. s3 would work but is not best choice: imagine having a lambda for every single user event that wrote to s3
@ZealousSwede
@ZealousSwede 2 года назад
Hozefa is a beast!!!!
@AyoubMAZA
@AyoubMAZA Месяц назад
This guy interviewed himself
@kartech4592
@kartech4592 Год назад
Can a load balancer directly insert to a queue?
@rituraj889
@rituraj889 Год назад
Yeah good point..isnt LB by defailt part of MQ I mean number of partitions or consumers can do the same thing
@AnushkaVijay-cv7tk
@AnushkaVijay-cv7tk Год назад
does meta ask system design question to sde1 role
@tryexponent
@tryexponent Год назад
Hey AnushkaVijay-cv7tk! Typically SDE1 candidates will not be asked system design questions
@t3ntube357
@t3ntube357 2 года назад
may I know the tool name they used?
@karthikr5884
@karthikr5884 2 года назад
Nice:)
@iamworstgamer
@iamworstgamer 2 года назад
18:32 this question had no clear answer given
@jjc5258
@jjc5258 2 года назад
This interview gonna fail, bad example
@ramgamery
@ramgamery 6 месяцев назад
Why? Can you please explain?
@ashwin81088
@ashwin81088 3 месяца назад
This interview went well imo. The system he described is what we use in my org. They took 3 years to develop but our boss designed it in 20 mins.
@HEKTO3
@HEKTO3 Год назад
Not very successful interview
@wuaaron662
@wuaaron662 2 года назад
uhm...uhm...uhm...uhm...uhm... ....................................................is it how it works in real system???????
@tsjoshi
@tsjoshi 2 года назад
"What if..." that's what happens all the time in reality.
@gsb22
@gsb22 2 года назад
17:10 WOW. I mean there were nit picks before this point but this is a big NO. The analytics platform HAS to save each and every event no matter what. It doesn't matter if this is being used by 1 user or trillions users, you have to store each and every event. The response to the scale problem, would be to scale out queue and ingestion service as the number of event increases.
@gsriram7
@gsriram7 2 года назад
@@joed5714 Actually its not. We heavily use sampling to keep up with upstream and its a standard practice when it comes to exorbitantly high (like 10000+ B events per second). We have a data pipeline to ingest packet header from all the routers. A router can process 5 Gbps and there are 1000+ routers and it is impossible to ingest all those events without sampling. Ofcourse unless you provision 10000+ 32 core instances
@nagoorshaik8025
@nagoorshaik8025 2 года назад
I think whether it is a BIG NO or absolutely YES is to be decided on use case. Depend on the metrics and the purpose we collect this data it might not necessary to collect metrics from every use. Sampling important statistical method that gives expected results with out going through each and every input. I know we have tools/methods and frameworks to be able collect each input with out a miss, but again do we need to do this or not has to be decided first otherwise you are jumping in to a solve a problem which doesn't exist.
@andrew3
@andrew3 2 года назад
Sampling is done by all the major tech players for large applications. It is completely valid to suggest.
@daryaarbuzova3315
@daryaarbuzova3315 Год назад
Stopped the video at the same timestamp to process what was said :\ Agree that it's a big NO. For example, sampling user conversions for ads analytics is not acceptable.
@craigslist1323
@craigslist1323 2 года назад
How is this guy a manager. Probably will fail intern interviews.
@yashmishra3900
@yashmishra3900 2 года назад
Who is the interviewer pls include her LinkedIn ID
@user-se9zv8hq9r
@user-se9zv8hq9r 2 года назад
yikes
Далее
Facebook System Design Interview: Design Twitter
24:22
System Design Interview: Design Amazon Prime Video
26:53
У ГОРДЕЯ ПОЖАР в ОФИСЕ!
01:01
Просмотров 3,6 млн
Friends
00:32
Просмотров 122 тыс.
Amazon System Design Interview: Design Parking Garage
29:59
Kafka Deep Dive w/ a Ex-Meta Staff Engineer
43:31
Просмотров 26 тыс.
Design a Data Warehouse | System Design
14:08
Просмотров 21 тыс.
У ГОРДЕЯ ПОЖАР в ОФИСЕ!
01:01
Просмотров 3,6 млн