System Design Interview: Design an Ad Click Aggregator w/ a Ex-Meta Staff Engineer

Hello Interview - SWE Interview Preparation

Подписаться 11 тыс.

Просмотров 21 тыс.

50% 1

00:00 - Intro
01:55 - The Approach
4:16 - Requirements
10:49 - System Interface & Data Flow
14:12 - High Level Design
29:43 - Deep Dives
52:10 - Conclusion
A step-by-step breakdown of the popular FAANG+ system design interview question, Design an Ad Click Aggregator, which is asked at top companies like Meta, Google, Amazon, Microsoft, and more.
Evan, a former Meta Staff Engineer and current co-founder of Hello Interview, walks through the problem from the perspective of an interviewer who has asked it well over 50 times.
Resources:
1. Detailed write up of the problem: www.hellointerview.com/learn/...
2. System Design In a Hurry: www.hellointerview.com/learn/...
3. Excalidraw used in the video: link.excalidraw.com/l/56zGeHi...
4. Vote for the question you want us to do next: www.hellointerview.com/learn/...
Checkout the previous video breakdowns:
Ticketmaster: • System Design Intervie...
Uber: • System Design Intervie...
Dropbox: • System Design Intervie...
Connect with me on LinkedIn: / evan-king-40072280
Preparing for your upcoming interviews and want to practice with top FAANG interviewers like Evan? Book a mock interview at www.hellointerview.com.
Good luck with your upcoming interviews!

Опубликовано:

24 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 189

@krishnabansal7531 Месяц назад

This kind of content can make someone fall in love with software engineering.

@ashishm8991 Месяц назад

Finding your channel feels like finding gold! There are ton of SD videos on youtube with shallow content basically exactly like you mention what a junior or mid level candidate would do. Going indepth for senior and staff is one of the highlight of your content. Please continue doing that. Also please don't worry about length of the video. Keep the gold coming :)

@PranayDatta Месяц назад

@hello_interview waiting eagerly for next video

@sherazdotnet Месяц назад

Watch this and then imagine if Evan puts together a System Design Learning Course. Just image that !!!! I mean we (the learners) will jump on it sooo quickly. This is just absolutely amazing. This is combining years of experience with hands on actual approach that works along with book contents presented in a very professional manner. Evan, think about it 🙂

@hello_interview Месяц назад

Maybe one day! For now just happy with all the people learning about hello interview and getting tons of free value

@sudarshanpanke7329 2 месяца назад

This is by far the best system design interview ever seen on the internet. Keep doing the great work sir...

@getmanfg78 4 дня назад

You are the best at what you do, after listening to your videos I cannot now like and tolerate other system design videos, please make more content for system design

@hello_interview 4 дня назад

Coming soon! 🫡

@user-qc8mx8hu5c 2 месяца назад

Honestly, it is the best SD showcase I’ve ever seen. You are the best. I watched all your videos and whiteboard them myself then. Thank you!

@hello_interview 2 месяца назад

So glad you like them and very smart to try them yourself and not just blindly consume!

@filiptepes-onea9824 2 месяца назад

One of the best channels on system design! Please keep going!

@HatimKhan-zj1hj 2 месяца назад

Really helpful videos - especially the breakdown for different expectations at mid/senior/staff levels, common things you see from candidates, and context into the actual systems like the shard limits for events streams. I used to work on Kinesis - happy you chose it!

@hello_interview 2 месяца назад

How cool! That must’ve been fun to work on :)

@vikaskumarsherawat 2 месяца назад

Thanks a lot for uploading these videos. They are very informative. Keep doing the good work.

@stevets583 Месяц назад

Awesome walkthrough! As a junior engineer I learned a lot. Near the end you said you wanted to keep it short but I appreciate the nuances you point out for your decisions including design choices between mid and higher level candidates. Time didn't faze me at all, I watched every second!

@hello_interview Месяц назад

Hell yah!

@usamaforu1 Месяц назад

This video helped me ace the system design interview. The detailed explanations provided in-depth knowledge of various components, which was extremely helpful for answering follow-up questions during my interview.

@vsejpal13 Месяц назад

This was such a pleasure to watch. Thank You. I would love to see a video on a metrics monitoring system. There will be some common components with ad-click aggregators.

@LUKEMELIKIAN 2 месяца назад

Thanks for the detailed explanation! Definitely learned some new things in this video.

@optimisticradish9121 24 дня назад

I rarely leave comments but this is BY FAR the best system design video I have seen on youtube. The thing I like most is that it differentiates from common textbook solutions that we see everywhere and you explain why certain choices were made. Thanks!

@hello_interview 24 дня назад

Thanks for commenting!! Glad you enjoyed it

@gauravaws20 2 месяца назад

absolutely brilliant content mate. keep em coming. only channel for which I have a notification on.

@jk26643 Месяц назад

Thanks so much for doing this! Greatly appreciated! By far the best system design videos I've seen.

@benhall4274 2 месяца назад

Love these! And can't recommend the Hello Interview mock interviews enough!

@hello_interview 2 месяца назад

Wahoo thanks Ben!

@AnyoneCanCode1 Месяц назад

Hey, I love these videos. I only used your videos and designing data intensive applications and that was enough for an E4 offer at Meta, I love the advice you give and common pitfalls you provide.

@hello_interview Месяц назад

Crushed it. Congrats on your offer!

@lalop2200 Месяц назад

I love this channel. Very good job sir, your strategy is really good a comprehensive. Straight to the main points. Bravo

@cavekancho 13 дней назад

You somehow managed to make preparing for system design interviews really fun. Massively underrated channel

@hello_interview 13 дней назад

You’re the best!

@viniciusrolandcrisci272 2 месяца назад

I saw some videos and your content is so great. Thank you so much for clarifying the SQL vs NoSQL debate. I always thought that bringing that into an interview was irrelevant but was afraid to do it. 😅 Keep up the amazing work.

@hello_interview 2 месяца назад

Yah funny how that was evangelized in a couple books and then just stuck

@ahnaf_chowdhury 2 месяца назад

Ah, nice, you re-uploaded! Thanks a lot for taking the feedback and acting quickly on this. And, sorry if it caused inconvenience for you 😄Thanks a lot for all of your hard work. 🙏

@hello_interview 2 месяца назад

Thanks so much for calling that out! Glad to get it fixed within the first day :)

@durrthock 2 месяца назад

Is e-commerce (design amazon / ebay) not as common as it once was?

@hazemabdelalim5432 2 месяца назад

you are honestly the best content on system design , can you do some playlist on the system design topics themselves ? i mean a video where you discuss replications in depth , concurrency , etc.

@hello_interview 2 месяца назад

Will definitely consider this!

@mullachv 2 месяца назад

Thank you for a great video. For a senior candidate it will be helpful, in my opinion, to narrate the data structures that underpin these solutions in addition to the supporting vendor products/technologies. In that, for fast aggregation of counters, one could demonstrate the use of fast but slightly imprecise counters using a count-min-sketch-like data structure, and for a slower but more accurate counter the use of a Map Reduce jobs. Aggregates and statistics over varying windows are almost a necessity for contemporary Monitoring, Analytics and ML Systems. And in such cases retaining data in-memory backed by persistent storage in the form of tree data structures keyed by aggregation windows are useful for range queries at varying granularities. For e.g.: root window [0-100). Immediate child node windows [0-50), [50-100) etc. It could be helpful to talk about idempotency within queue consumers. And also out-of-sequence arrival of events in the queue (handled through watermarking)

@hello_interview 2 месяца назад

Can have some future videos which go deeper on probabilistic data structures or other more foundational topics.

@srinivasanrengarajan925 9 дней назад

Beautiful Design and Amazing explanation - just impressed with the elegance of the design and the beauty of software engineering.

@hello_interview 9 дней назад

😍

@chengchen8028 Месяц назад

Incredible video with excellent drawing and explanation.

@surojitsantra7627 Месяц назад

Looking for your next videos. Pls upload more design problems. It almost 1month you have not uploaded. Love your content.

@hello_interview Месяц назад

Sorry, was traveling. Recording a video today! Up by EOW

@smokebomba 14 часов назад

I felt this was much better than the Alex Xu System Design Vol 2 on the same topic. Great Job1

@hello_interview 13 часов назад

High praise!

@zayankhan3223 Месяц назад

This is another great video!! Please keep it coming. Can we use mirroring in Kafka and have spark read from the mirror and provide data to reconciliation service?

@hello_interview 24 дня назад

you know i have less familiarity there. potentially, but not totally sure.

@96jugal 28 дней назад

Love the content! Thank you for making these!

@vanessapinto9737 Месяц назад

These are Excellent! Please keep going.

@davidoh0905 Месяц назад

For hot shard problem in Kafka, you can salt things but in Flink, we will no longer have all the events aggregated in one Flink task. In the case where we are sinking to Redshift, we could have aggregate it there. but if we want to access it in stream, maybe we need a secondary stream that aggregates everything? So that added salt needs to be handled one way or the other

@hot13399 Месяц назад

Looking forward to more great videos from you! :)

@Guimo74 17 дней назад

I like your videos, I have learned a lot. A couple comments on this video: a. I think the system would benefit from a Redis in the click processor service, not the idempotency lock but a redirectUrl cache {adId: redirectUrl} to reduce reads in the Ad DB. It might be a MRU cache to avoid overloading the Redis. b. I'm not sure why you are pushing Kinesis so hard in this solution, I mean yes I learned something about Kinesis, but it would be more practical just to place a Kafka that can handle the load peaks and has event history as well so it is possible to write the reconciliation procedures from it. c. I learned about Flink, thanks. I used a redis aggregator in my own solution. Thank you so much for your work!

@nguyenhoangphuckan 8 дней назад

Incredible video, keep it up!

@nelsonn5123 Месяц назад

I think bloom filter would be a good choice to check on duplicate impression id. I think, it is also supported by redis.

@KENTOSI Месяц назад

Excellent walk-though!

@Neil-ph7rf Месяц назад

when will you post the next interview video? waiting for it about 1 month!!! really appreciate the effort.

@hello_interview Месяц назад

Tomorrow!!

@zuowang5185 2 месяца назад

These interview preps make you feel like if you know enough of system design knowledge, have good cross team examples for bq, and can solve leetcode medium-hard fast, you can get to higher level quicker than going through internal promotions.

@lukt 2 месяца назад

The system design videos on this channel are the best out there. Thanks for putting in so much time! I did have a question regarding the proposed reconciliation architecture: I get, that data accuracy is important and it acts as sort of an "Auditor" in our system. However, you mentioned that errors might stem from e.g. bad code pushes or out of order events in the stream. The proposed reconciliation architecture would really only fix issues that would occur *within Flink* though, right? At the end of the day, the spark job is 'still acting upon that same data from Kinesis, so in case of out of order events or bad code pushes, it would also be affected, no?

@hello_interview 2 месяца назад

Yah if you messed something up in the server before kinesis you’d be screwed still. But you’d want to keep that as simple as possible. You can trust kinesis will be reliable, out of order won’t matter for reconciliation.

@lukt 2 месяца назад

@@hello_interview got it. Thanks for the quick response. :)

@SauceRawr 7 дней назад

Hi this was super helpful! My question is how would you handle the offline channels case? i.e. how would you aggregate data for one ad shown across multiple channels? I feel like the design wouldn't have to change that much because the adId could just remain the same and you can just add a "channel" metadata field for where it was shown.

@Global_nomad_diaries 2 месяца назад

The best system design content. Thanks alot for helping me to prepare for my upcoming interview s. Can you please clarify the difference between product design and system design at Meta?

@hello_interview 2 месяца назад

www.hellointerview.com/blog/meta-system-vs-product-design :)

@deathbombs 2 месяца назад

21:48 I like how DB can be used for simplest case consistently in these approaches

@kevinj1474 13 дней назад

Great video. Only nit is 301 is the code for Redirect, not 302.

@Andrew-pj9ik Месяц назад

Thanks for the great videos - they are extremely helpful. I noticed at around 24 mins in you mention querying for the number of distinct userIDs. I don't think you're going to be able to serve queries like that using the pre-aggregation you suggest doing with Flink. I don't know a good solution to this problem other than dumping the list of userIDs for each minute window into your OLAP DB. You might be able to use HLL hashes for this, but depending on how granular the dimensions are in your DB, it may not be worth it.. I think it's at least worth mentioning this if we think users care about unique counts.

@deathbombs Месяц назад

I rewatched and had some new thoughts. Wonder what are the costs of using streaming solution? I seems like the database for clicks that was used in batching solution is completely replaced by the streaming components, so benefits from having the previous database queries are lost? 34:52 streaming solution real time is by dumping to OLAP?

@bishwajitpurkaystha7114 2 месяца назад

Great video!

@Aizen4661 17 дней назад

How does it become fault tolerant if we are not check pointing? Using reconciliation worker?

@yarik2303 Месяц назад

hi, I may missed this: is it possible to put Apache Kafka + ksqlDB to build aggregations with Materialized Windowed Tables, where you can also use flash interval? Is it acceptable for such interview?

@ravisr4561 28 дней назад

Hi Great explanation for a complex topic in such a easy way.Is Once only processing also critical when we are passing messages from kafka to flink.If we want one to enable once only processing on flink then checkpointing will be required juts a though

@hello_interview 26 дней назад

You can set it per message! From the docs, “Every Amazon SQS queue has the default visibility timeout setting of 30 seconds. You can change this setting for the entire queue. Typically, you should set the visibility timeout to the maximum time that it takes your application to process and delete a message from the queue. When receiving messages, you can also set a special visibility timeout for the returned messages without changing the overall queue timeout.”

@CS2dAVE Месяц назад

Amazing content! Very much appreciate you posting these 🙌 System design padawan here. I have a question about the hybrid approach .. what makes us trust the "Kinesis -> Connector -> S3 -> Spark -> Worker -> OLAP" results more than the "Kinesis -> Flink -> OLAP" results? Is it a guarantee where the connector always successfully writes the data to S3? or does Flink make some kind of tradeoff for speed? kind of confused about that piece and figured i'd ask. thanks again!

@mehdisaffar Месяц назад

I am also curious about this

@mehdisaffar Месяц назад

Great content! I wonder how long the ad impression IDs stay in the redis database? I imagine they would expire after some time. If we imagine they expire after one day, I can imagine a malicious user who would request an ad, keep browser open for one day, then spam the ad with 1000s of clicks. Maybe the signed ad impression "token" should have an expiresAt to ignore those ad clicks, so that we can free up the redis db?

@hello_interview Месяц назад

Exactly. The signature can have a timestamp to expire the impression after a few hours or day

@lewisbobrow14 2 месяца назад

Hey Evan, I'm not preparing for an interview, but I find these videos incredibly helpful. I'm an L5 at Amazon trying to learn more about systems design. I see mock interviews as a great way to solidify my understanding of concepts that I'm reading about in books, because, after all, it's very difficult to get hands-on experience in actually building these big systems. I'd find it super valuable if I could self-study a design pattern, like event streaming, then do an mock interview on a related problem to test my understanding, like when to choose lambda vs. kappa vs. hybrid architecture. Does Hello Interview offer this?

@hello_interview 2 месяца назад

Hey! Kudos for the focus on continuous learning and glad to hear you're finding the videos useful. There is absolutely nothing stoping someone from doing a mock that does not have an upcoming interview! Of course, the sessions are tailored toward making sure you know all the tricks to pass the interview, but you could always give your coach a heads up that you're more interested in just evaluating your design skills.

@hotdiary 2 месяца назад

Thank you and great video. Can we assume the click processor service will scale? Can we make this serverless or shard it and place it behind another LLB?

@hello_interview 2 месяца назад

Yah I breeze over this at one point in the video. Easy for it to horizontally scale.

@ramannanda Месяц назад

Great Vid, redis cache needs expiry, how do we manage eviction?

@hello_interview Месяц назад

For the idempotency keys? Just have a reasonable TTL based on cost constraints

@IlyaGazman Месяц назад

The last Redis piece you put there, it probably needs to be cleaned up some time. Also, what happens when it goes down, i would probably add another dedup point along the way, maybe in the reconciliation layer. Or add another layer just for that.

@mayankkaushik6837 Месяц назад

I can't access the answer keys and vote for questions page on the website. Don't know if that's by design or a bug. Btw really love how you start from the "bad" but intuitive solution first and build on top of that

@hello_interview Месяц назад

Should be fixed!

@javidakthar177 2 месяца назад

Thanks pls update more

@hello_interview 2 месяца назад

On it! One every 3 weeks :)

@chirag.shah96 2 месяца назад

this is amazing!

@jianwuchen8305 14 дней назад

great content

@tunepa4418 Месяц назад

Please can you clarify this? You mentioned the count query on cassandra will be really slow. Would it really be slow? If the partition key is ad_id and the sort key is timestamp. I assume all the data for the same id will be on the same partition sorted by timestamp. Why would it be slow?

@stephenwon6007 2 месяца назад

Thank you for the video! I am learning a lot from this! Btw I have a question on the Lambda vs Kappa architecture. If the lambda architecture is the combinations of the realtime and the batch process, then isn't your approach using just the lambda architecture?

@hello_interview 2 месяца назад

Yah bit of nuance here, nuance that i don't think is all that important frankly, but, while we do have both real-time (flink) and batch processing (spark), the integration and reliance on real-time stream processing make it lean more towards a Kappa-like approach. The batch layer is secondary and primarily for reconciliation, not a core component. Hence, it’s a hybrid but not a pure Lambda architecture.

@tunepa4418 Месяц назад

@@hello_interview that makes sense. In the real lambda architecture, we rely more on the batch? i.e it's going to run periodically faster to fix things up

@mcmadhan01 2 месяца назад

Fantastic work. Is it safe to assume this is a regional setup, and we need cross region synchronous replication mechanisms on the choice of OLAP to allow the query service layer to be consistent across regions? I mean, a write locally, read anywhere type of architecture for OLAP needs to be called out in the deep dive right?

@hello_interview 2 месяца назад

Yah this does come up sometimes in the interview depending on the company. Most common at good. Write locally read globally is sufficient

@mcmadhan01 2 месяца назад

@@hello_interview Great. Thank you for your reply.

@hinata4661 18 дней назад

Can we partition cassandra on AdId and use timestamp as sort key? This will make our query faster for smaller time intervals, but we will still need to aggregate data if the time range is too large.

@hello_interview 17 дней назад

you'd want to do this for the map-reduce approach, yes

@romilgoyal5349 Месяц назад

Question: For the sharding while processing the events through Kinesis, the adId was suggested as the sharding key. This doesn't look like the best approach. At scale, millions of ads are being run on the platform and a good share of them have high enough volume. Going by the presented logic, the number of shards would explode. What do you think about this?

@htm332 8 дней назад

Could we use a bloom filter instead of redis to a) avoid storing a huge number of ad impression ids and b) eliminate the (albeit minimal) redis read latency?

@hello_interview 8 дней назад

You’d probably still have the bloom filter in redis tbh. So latency not a concern. But if you had some memory constraint then this is reasonable

@katiewu9514 2 месяца назад

That's a very informative video! Two questions: 1. To solve idempotency issue, should this ad impression id be user unique? Otherwise, we should check if the combination of ad impression id and user id exists in the Redis to know if a user has clicked on this specific ad before. 2. You talked about Kappa and Lambda architecture and said that the solution uses hybrid of these two architectures. I am not quite familiar with those two architecture. But after doing some research, I feel this approach uses Lambda architecture since Lambda architecture has both batch layer and streaming layer, merge batch results and streaming results to show a unified result to user.

@hello_interview 2 месяца назад

Yes the detach key (ad impression id) needs to be user unique. Good question on the architecture, couple related questions below in the comments where I share my answer. Sorry for making you scroll, just easier than re-typing :)

@deathbombs 2 месяца назад

I believe lambda uses probabilistic data structure

@ashishbtech 2 месяца назад

Awesome content. Would it be possible for you to make a video on a video service like Netflix with focus on uploading and streaming?

@hello_interview 2 месяца назад

I think it’s already on the list, but you can vote for it via the link in the description!

@ramannanda Месяц назад

why not use kafka for storing/doing streaming aggregations into the OLAP?

@ramannanda Месяц назад

Ah, spoke too soon lol :)

@deathbombs 2 месяца назад

Can you cover an authentication related system? Previously mentioned using tokens, but would like something more in depth pls

@hello_interview 2 месяца назад

Feel free to vote for new questions for me to do here! www.hellointerview.com/learn/system-design/answer-keys/vote

@0000b5 2 месяца назад

Do we need to add a hop by forwarding events into kinesis? Is it perhaps a better idea to fan out forward the click event to the processor for redirects, as well as kinesis for better throughput?

@hello_interview 2 месяца назад

Not sure I follow, mind rephrasing? :)

@0000b5 2 месяца назад

In order for the event to get to kinesis, it looks like it goes through a middle service. Is it possible to route it directly to kinesis from the load balancer as opposed to through the middle service

@zuowang5185 2 месяца назад

Instead of Spark, can you use AWS lambda serverless to do that job? Or directly send a task from click processor service to a kafka queue to the item to be batch added onto a aggregated read optimized db?

@zuowang5185 2 месяца назад

I should have watched the entire video before asking this question

@dr0bert Месяц назад

Calling Ad DB from Click Processor Svc might not be the best pattern (DB is shared between microservices), an area that could have been improved with calling Ad Placement Service or some other service responsible for the ad metadata and caching that url in Redis.

@tarunstv796 28 дней назад

Can you please expand on below questions? or Link a small video/article if possible 1. How will "Click Processor SVC" know which AdID is popular/hard? 40:24 2. How will "Flink" handle further aggregation of AdID:0, AdID:1,..., AdID:N to AdID 40:43

@hello_interview 24 дня назад

1. Could be based on past performance or budget. Realistically, companies have ML models to predict this. 2. A single job will read from different partitions and aggregate.

@KarthikChintalaOfficial 21 день назад

That's a great system design on Ad Click aggregator. I don't know much about Kinesis and Flink tbh Question: Why use kinesis when you can use SNS as a fan out to SQS? It feels similar to me.

@hello_interview 18 дней назад

SQS could maybe replace kinesis (some throughput considerations there), but not sure I understand the question.

@aspiring1460 2 месяца назад

I honestly feel you should hire @Jordan Has No Life as a system design expert on your channel. The depth of system design in his videos his quite good and honestly it makes up for a senior engineer. As what's the case with Staff SWE Expectations well that depends honestly on the individual. I think It can only come from experience or reading books such as Database Internals and/or DDIA. No amount of videos can make up for the Staff SWE expectations in System Design.

@hello_interview 2 месяца назад

We love Jordan ♥️

@aspiring1460 2 месяца назад

@@hello_interview Me Too!! That guy's an OG in System Design.

@raunakkhandelwal4999 2 месяца назад

The out-of-scope non-functional requirements seem to be more like out-of-scope functional requirements. I feel that (spam detection, demographic profiling, conversion tracking) are essentially features rather than characteristics of the system. How should I be thinking about this?

@hello_interview 2 месяца назад

Honestly, fair point

@deathbombs 2 месяца назад

25:20 I thought for DB, time series databases can write fast and also handle ranged based queries quickly? Or some wide column databases

@hello_interview 2 месяца назад

yah can be a good consideration. don't know enough about the ins and outs of popular TS DBs to offer a strong justification either way

@deathbombs 2 месяца назад

Thanks! I see it getting name dropped in a lot of books , but outside the books I haven't see it a lot

@SB-handle Месяц назад

Can the same design be used to design a Top K Service which finds top K videos per minute(Aggregation Window 1 min), Per Hour(Aggregation Window 1 hr with checkpointing), Per Day (Aggregation Window 1 day with checkpointing) and store them in a Redis Cache for the "Top K service" to query. And for longer time periods like 1 year or forever, a daily cron job can query the OLAP DB to get those and update that in the Redis Cache.

@hello_interview Месяц назад

Actually checkout our website! We have a written breakdown for topK there

@asokaa8815 Месяц назад

Would a question like this be asked for Product Architecture or System Design interview?

@hello_interview Месяц назад

System design in meta world.

@fayezabusharkh3987 2 месяца назад

For Kinesis hot shards, we don't know if an ad is hot beforehand. So are these ad_id 0-N always active? Is it ok to use x10 the amount of streams we need under normal circumstances? For Flask, we have the same amount of flask servers as the Kinesis shards right? If the server dies, how will the new server keep track of the pointer from the old server? Are they statefull backups instead of stateless

@hello_interview 2 месяца назад

This is a great question. In reality you can make predictions here. We know based on budget and historical performance which ads we’d need to be worried about before hand

@dibll 2 месяца назад

Evan, will be be seeing the rest of write ups in video format too in the coming days?

@hello_interview 2 месяца назад

One every 3 weeks or so

@dibll Месяц назад

@@hello_interview when can we see the next video?

@VyasaVaniGranth 2 дня назад

Thank you for the video! Isn't this exactly what Lambda architecture is and not a "hybrid between lambda and kappa"?

@htm332 8 дней назад

Why would it be a problem for Flink to be sharded across hot ad ids? Multiple rows per (ad id, minute) key would be emitted instead of just one, but an OLAP query could trivially SUM them

@hello_interview 7 дней назад

True!

@aenesunal 25 дней назад

Do you think knowing OLAP is important for a senior/staff role? Having no experience with analytics, I'd just go for an RDS - guess it'd probably be fine?

@hello_interview 25 дней назад

Yah you’d be fine likely :)

@aenesunal 23 дня назад

@@hello_interview Thanks, also thank you for all the resources you've created. Amazing.

@techlifewithmohsin6142 Месяц назад

If we don't use checkpointing then if Flink goes down, then after restarting the Flink how would it know from which offset to resume? Because Kafka itself does not manage offsets for Flink consumers directly. While Kafka maintains offsets in the __consumer_offsets topic for consumer groups, Flink does not rely on these offsets for its fault tolerance. Instead, it uses the offsets stored in its checkpoints.Can you please clarify here what do you mean not using checkpointing is ok in case of failures of Flink processor?

@hello_interview 24 дня назад

Flink would commit the offset to kafka upon flush. So if it goes down it knows where to start again based on the last completed write to the aggregated db

@techlifewithmohsin6142 2 дня назад

@@hello_interview Then why would we ever need to use checkpointing and not use Kafka offsets always?

@dhruvbhai1468 2 месяца назад

Can you tell the tool that you're using for drawing?? TIA

@hello_interview 2 месяца назад

Excalidraw. File linked in description.

@vigneshraghuraman Месяц назад

how do we ensure idempotency with unique impression ids? especially if users are not required to be logged in , and there is no concept of user iD?

@hello_interview Месяц назад

Checkout the section in the video on idempotency. It goes into this :)

@vigneshraghuraman Месяц назад

@@hello_interview you mention that we look up (ad_impression_id, userID) but I'm confused since there is no concept of userID. Trying t understand how just a signed ad_impression_id will achieve idempotency? or we also need userID

@vigneshraghuraman Месяц назад

the question I had is, if each ad has a unique idempotency key, whether one user clicks on that ad, or different users click on that ad, either way the idempotency key is seen already right?

@fayezabusharkh3987 2 месяца назад

A bit unrelated question.. Do you feel it's worth to try to do some self-learning about ML/AI and attempt to switch to that area? And how do you feel about the market in that regard Thanks

@hello_interview 2 месяца назад

Hmm, maybe possible for a start up. Would be really difficult to pull off for FAANG or FAANG adjacent. The easier path is to work on a ML infra team and spend time closer to the modeling to learn that way. This is actually what I did. I don't claim to be an ML engineer, but I got a lot of exposure working on a team alongside ML PHDs and doing applied ML off and on. Then the switch internally becomes more natural.

@fayezabusharkh3987 2 месяца назад

Thank you!

@hotdiary 2 месяца назад

The extra compute in click processor to check legitimacy of ad impressions based on signed impressions is still likely vulnerable to DOS attacks. Perhaps that should have been stated as out of scope.

@hello_interview 2 месяца назад

Fair!

@ronabrahamcherian 2 месяца назад

I think if you use gateways like Amazon managed ones, they do a great job of preventing DOS attacks as well. which is an additional property they service additionally to routing

@kartikeyshrivastava5178 Месяц назад

Can u please upload system design interview for all basic topic.

@hello_interview Месяц назад

What did you have in mind?

@kartikeyshrivastava5178 Месяц назад

@@hello_interview I really like the way u differentiate the expectation and learning required for different level. Topic like Rate Limiter, Amazon (scale during big billion days), Payments etc design could be very much helpful.

@saber3112 2 месяца назад

why we need olap here query service can directly query from flink

@hello_interview 2 месяца назад

Talk about this a bit in the video. Main two reasons I’d advise against it is contention and isolation. If the click stream breaks advertisers can still query data. Aggregated db goes down we still track clicks.

@shanaulhaque 2 месяца назад

Can you please create video on adblocker

@hello_interview 2 месяца назад

Add it to the backlog via the link in the description!

@_ryankrol 2 месяца назад

I didn't quite follow the point around not needing checkpoints in Flink. If a node goes down, and then comes back up, are we just accepting that the data is lost, and rely on the reconciliation worked to fix it? It doesn't seem obvious why checkpointing wouldn't make sense here.

@hello_interview 2 месяца назад

We have retention on our stream, so we’ll just pick back up reading the data from the start of the minute again (or as far back as we missed)

@_ryankrol 2 месяца назад

@@hello_interview But how would we know where (which minute) to pick up from if we never checkpointed the state of the Flink node? As far as I understand, checkpointing will usually store something like the queue offset (or in this case maybe the last full minute we processed?) to know more-or-less where we've got up to with the previous node that failed. If we're not using checkpointing, I'm a little lost about how we'd recover

@deathbombs 2 месяца назад

My interpretation is the stream has a cursor that tells system where it's out of date and recover starting point

@depengluan7222 7 дней назад

Same here. Checkpoint should be needed in this example. Say in the middle of a one-minute window, the Flink job is down and Kafka brokers save the offset where the Flink consumer group left. Once the job is recovered from the checkpoint, it continues to read from where it left, what's more important is that it can complete the one minute window ACCURATELY where it left. Otherwise, the recovered first one minute window could be not accurate, or some clicks to aggregate and report could be lost.

@UnderratedMomentsfromStarWars Месяц назад

Would this show up in product design interviews?

@hello_interview Месяц назад

Unlikely.

@UnderratedMomentsfromStarWars Месяц назад

@@hello_interview I'm loving your product design series. I know it takes a while to make them all, but I'd be thrilled to buy just a pack of topics, not even breakdowns, that have shown up. I haven't been able to find any

@hello_interview Месяц назад

@@UnderratedMomentsfromStarWars Which topics do you want to see?

@UnderratedMomentsfromStarWars Месяц назад

@@hello_interview to be honest, i'm just extremely uninformed on what topics might pop up. I'm studying for product design now, and looking for a list of topics, but most resources focus on system design

@bigphatkdawg Месяц назад

The final solution _is_ literally lambda architecture.

@user-ql1rg9mj9d 2 месяца назад

good good

@alokuttamshukla 2 месяца назад

Re-upload ? I thought I saw this video 10 hours back or I am dreaming.? 🤣🤣

@canertuzunar6171 2 месяца назад

Yes, the old version of the video had some editing issues

@hello_interview 2 месяца назад

Yup, missed including a deep dive white editing :)

@PoRBvG 2 месяца назад

why do you put OLAP as a DB instead of giving the DB name. OLAP is not a DB and it gets confusing when you say that as a Staff engineer

@hello_interview 2 месяца назад

It’s a quality of a database. Choosing a specific technology is often times less interesting than articulating the qualities that you’d select for. I mentioned some example, specific databases verbally if I remember correctly.

@rationallearner Месяц назад

I don't have knowledge about Flint and Kinesis, infact never heard about it prior to going thru this video, does that almost mean that I am gonna tank the Staff level interview? What's the best way to handle such a scenario?