Adding a cache is not as simple as it may seem...

Подписаться 123 тыс.

Просмотров 105 тыс.

50% 1

Knowing what the expect and how to mitigate the issues with caching is the first step towards a successful caching implementation.
Adding a cache is usually a good way to reduce load on your database, but it does come at the cost of increased complexity. This video looks at the most commong caching pattern, and the problems that can occur.
This video was sponsored by Aiven. Their platform is used in the video for a free tier of Redis and PostgreSQL.
To get your own free Redis and PostgreSQL instance, sign up at go.aiven.io/dreamsofcode
#postgres #redis #caching
GitHub Project:
github.com/dreamsofcode-io/sp...
The information in this video is found in this whitepaper by AWS.
docs.aws.amazon.com/whitepape...
Become a better developer in 4 minutes: bit.ly/45C7a29 👈
Join this channel to get access to perks:
/ @dreamsofcode
Join Discord: / discord
Join Twitter: / dreamsofcode_io
00:00:00 Intro
00:00:35 Cache Aside
00:01:01 Implementation
00:06:52 Cache Invalidation
00:07:28 Eviction Policy
00:09:15 Key Expiration
00:10:37 Write-Through Caching
00:12:49 Outro

Наука

Опубликовано:

26 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 212

@dreamsofcode 3 месяца назад

Big shout out to everyone in the comments on this video for asking GREAT questions

@user-jb4pt2or2j Месяц назад

Can you share your neovim distro?

@dreamsofcode Месяц назад

@@user-jb4pt2or2j you can find it on GitHub at elliottminns/dotfiles

@nathaaaaaa 3 месяца назад

Usually instead of write-through, I just DEL the relevant keys and force a new cache miss. Looks very reliable to me

@dreamsofcode 3 месяца назад

That's a cool idea. I imagine it makes things a little more simple and can work for more advanced aggregations! I like it!

@xorlop 3 месяца назад

lol just left long winded comment about this

@dreamsofcode 3 месяца назад

@@xorlop I'm glad you did!

@daleryanaldover6545 3 месяца назад

I would do the same 😊

@123mrfarid 3 месяца назад

Good idea. Thank you..

@o11k 3 месяца назад

"There are only two hard things in Computer Science: cache invalidation and naming things" ~Phil Karlton

@hansenchrisw 3 месяца назад

And off by one errors 😉

@Rundik 3 месяца назад

And cache invalidation

@tacticalassaultanteater9678 3 месяца назад

@@hansenchriswand scope bloat

@af43bacc 3 месяца назад

Concurrency and floating bugs: "Am I a joke for you?"

@markhaus 3 месяца назад

@@Rundik and cache invalidation

@rodemka 3 месяца назад

Video checklist: ✓ Editor - Neovim ✓ DB - PostgreSQL ✓ Cache - Redis ✓ gRPC - Fulltext seraches: meilisearch/sonic/typesense and postgres tsvector/tsquery - Authentication and authorization - oauth2, saml, openid, jwt, etc. endless list - full axum course from "todo app" -> "url shortener app" -> "pocketbase like app" - templates engines + hype about htmx - reports from db - rust/go + db -> pdf creation Thank you for the inspiring high quality videos!

@dreamsofcode 3 месяца назад

Thank you for the great suggestions!

@xorlop 3 месяца назад

What a cool video! So many great ideas. Another idea for write through cache: delete the key instead! I think this could be good because whenever you update the entry, you are resetting its LRU value, which might not be accurate/helpful. I think there are a few cases where db write is not aligned with access of the key from cache. What if user writes spell but doesn’t use it right away, for example? By deleting it, you are saying save is not the same as use, which might be better aligned for a spell store. Newly updated spells might not be so popular. It also helps minimize cache overall size, which probably helps the redis LRU algorithm, which is only approximate LRU.

@dreamsofcode 3 месяца назад

This is a great idea! Deleting does make a lot of sense when it comes to fast accessed data. I think the time you'd want to use write through would be when the data itself takes a long time to populate. But even then, you'd likely use some sort of expiration/deletion based resync. Another approach would be to not extend the expiration which I believe you can do with another redis SET option.

@daleryanaldover6545 3 месяца назад

yes, the principle is delete cache on every operation except get requests, that's where we store the cache!

@nedimkulovac6394 2 месяца назад

Man, this video is awesome. By far the best and clearest explanation I've come across. Thanks a ton! I would like to see more videos explaining cache strategies and when to use caching and when not to use it.

@Cranked1 3 месяца назад

Making the writing to the cache independent of the completion of the request can be dangerous because if writing to the cache fails, you have a big problem. It also could happen that the user already moves on to make another request before the cache was written in the previous request which results in wrong data. This can be completely inconsistent and you have no gurantees (e.g. like database ACID). Even a database transaction won't save you because the system can still fail between cache write and transaction commit.

@dreamsofcode 3 месяца назад

Yep, 100%. In a distributed system, this is even more challenging as you'd likely need to lock the key in the cache as well, which is even more complexity.

@DryBones111 3 месяца назад

@@dreamsofcode Eventual consistency is both a blessing and a curse. Just like how async colours your functions, eventual consistency colours your whole system.

@penguindrummaster 3 месяца назад

I like the final takeaway saying caching is not your first step, and that database optimization should always be a consideration. I've seen too many people complicate their tech stacks just to avoid tackling an otherwise simple problem. Much like C, just because SQL is old doesn't mean it isn't really good at certain tasks.

@posteisnoob5763 3 месяца назад

Thanks for the great video!! I would really like to see your take on when / when not to cache

@thienlacho860 3 месяца назад

With write through, you may face dual write problem. There maybe a success write to database, but a timeout in Redis call. In that case, there is a stale data still in redis. In my application, I use Debezium to capture the change in the database and Produce to a Kafka topic, then a background process consume those changes and apply a cache invalidation. In my opinion, cache delete is safer than change the cache, as one cache may be affected by many different action, and those action may come concurrently, if you change the cache in wrong order due to async, then the cache may be changed to wrong result. Just delete the key for safe and memory efficiency.

@dreamsofcode 3 месяца назад

Debezium is pretty great. I wanted to showcase it in this video but it blew the scope out way too much! CDC caching is dope.

@Biowulf21 3 месяца назад

Love your videos man. Keep it up!

@legobuildingsrewiew7538 2 месяца назад

Instantly subscribed! Great video.

@petar567 3 месяца назад

Great video. Thanks for the information, also I would appreciate it if you make a video of when to use cache and when not.

@marcing5380 3 месяца назад

One major thing to remember about caching and caches (or in general where you have two separate sources of truth/data) is that you'll always run into eventual consistency so you shouldn't use it in every possible scenario. I.e. there is a non-zero time for the cache and DB to sync up in which the data is inconsistent but still readable. The only to avoid is I think explicit locking but that slows down the whole thing quite considerably - when you make an update to a table you lock everything related to it and unlock after the write and cache update have been completed.

@wcrb15 3 месяца назад

Too many people reach for caching as a mechanism to improve performance when actual performance tuning of thr application is the more appropriate action. Cache isnt going to save you if uou application is over fetching or inefficiently grabbing data from the DB. But wjem it's used correctly caching is awesome!

@allroni 3 месяца назад

Great video, as usual! 🙂

@n0kodoko143 3 месяца назад

awesome video. I would love to see a 'when to and not to cache' video.

@dreamsofcode 3 месяца назад

Thank you! I shall do one then! 😁

@alirahimi4477 3 месяца назад

Most of the time that one select by id isnt the bottleneck, rather it is a complex query that returns a possibly big result and caching that can be a real pain or plainly impossible. When you query by X but update by Y there is no clear way to use the write through method to update your cache because you dont even know what cache keys you should be updating!

@dreamsofcode 3 месяца назад

Yep, you're correct! Apologies if I didn't make that clear in the video, I didn't want to complicate the caching implementation itself so went for a simple query

@giuliopimenoff 3 месяца назад

I just removed the Redis cache for my project because I figured out it created more issues than benefits. Databases are already fast as heck, so use caches with intention. I use Redis for session tokens for example but nothing else rn

@dreamsofcode 3 месяца назад

I think this is right choice. Session tokens are a good use of caching.

@rosiepone 3 месяца назад

honestly I almost never use an external cache for anything I've written, because it helps me considerably more to consider how and WHY I want to cache information for each particular type of data. Some of it never needs a cache at all, and some of it only needs to cache a few bits of data, and it also gives you a hint on how to tell when your cache is stale since you know exactly what you're caching and when, rather than caching all data all the time

@giuliopimenoff 3 месяца назад

also when data is relational caching just triplicates the effort to keep it synced properly

@anthonycavagne4880 3 месяца назад

I don't exactly understand you store the userId as the key and the token as the value ? Why is this better than using cookie ?

@giuliopimenoff 3 месяца назад

@@anthonycavagne4880 I use cookies and in the cookie I store the session id. Then in redis I have an hashmap with the user id and session id, so I can get the session data quickly and can also invalidate all sessions if needed

@skr-kute1677 3 месяца назад

Thanks for the vid Informative and simple

@EvanEdwards 3 месяца назад

Best walk in on a codebase I ever did was to realize they had left on passthough on their cache layer. The cache was invalidated with every request and passed through, presumably as a debug/development one line shortcircuit. I pointed it out, they deleted the one line and vastly improved responsiveness. They were nearly three months post-launch. (I was working on a loosely coupled service; consulting company operating as a separate department, essentially. I found it when looking into connecting to their database for some features).

@underflowexception 3 месяца назад

if you're using PHP and Laravel you can use the dispatchAfterRequest function to save to cache

@amolgupta8077 3 месяца назад

yes

@neliosantos4014 Месяц назад

Amazing!! 😄

@WanderingCrow 3 месяца назад

Great video, very clear and articulated! Cache is among my biggest weakness, I think, after authentication and token management, so I'd be interested to learn more about them, and how/when to use them effectively 🤔

@dreamsofcode 3 месяца назад

Thank you! They definitely have their use cases, but they're not that simple to implement and there's a lot to consider, more so that I even do in the video!

@GreatTaiwan 3 месяца назад

external IDP with SAML2 & OIDC (PKCE) is my biggest weakness

@SeanLazer 3 месяца назад

My advice is squeeze as much perf as you can out of your primary data store before you add a caching layer! Your RDBMS can take you a lot further than some people realize.

@kennedydre8074 3 месяца назад

I would really love to see a video of when to cache and when not to cache, thank you.

@Sarwaan001 3 месяца назад

I work at a team that handles very large amounts of data and we usually take a “best tool for the job approach”. E.g. for a graph database as the ground truth, we use it for simple queries that are O(1) use a search db for search at O(log(n)) and use a trigger to send data from the graph database, obtain the full object by performing a walk, sending the object to a search db. Feel like this is technically like caching but it’s still very fast and we think of cache databases more of a crutch to by more time rather than a solution.

@fahimferdous1641 3 месяца назад

I legit thought today's sponsor was docker XD Are you using the embedded terminal though?

@hunorportik5618 3 месяца назад

Useful info, well described. One important thing was left out IMO: using concurrency might actually re-introduce the stale-data-issue since one might fail due to a non-transient (or improperly handled transient) issue.

@dreamsofcode 3 месяца назад

That's correct! This issue becomes even more problematic in a distributed system as well if we horizontally scale our app.

@mementomori8856 3 месяца назад

crazy that you release this the same day as I start implementing Redis from scratch

@zshanahmad2669 3 месяца назад

great video. my biggest problem with caches in bigger projects is dealing with related data. for example, I cached the blogs api: GET blogs/blog_ID, this api returns the JSON which has blog and the information about the author. When the data about author changes e.g. their name, I have to invalidate the blogs/blog_ID too, otherwise users will get the old author data. I know I could only return the blog data in blogs/blog_ID request, but I cant change the frontend, which expects the user data inside of the response.

@yuu-kun3461 3 месяца назад

After watching the recent PostgreSQL video by "The Art Of The terminal" it would seem to me that adding Reddis to PostgreSQL is not needed for most projects. Additionally, as presented in the blog post by martinheinz, a cache can be achieved by creaing UNLOGGED tables. And if the key-value pair functionality of Reddis is that important, the video mentioned covers that too.

@dreamsofcode 3 месяца назад

100% agree. Most use cases it's not needed. And the complexity can often outweigh the benefit. I don't know if I would consider UNLOGGED tables a viable alternative. I did some benchmarking on them for another video I had planned and they're no where near as fast. There's also a lot of caveats with using them and if you're unaware it's unlogged, mistakes can happen.

@peppybocan 3 месяца назад

I think Redis is more perfomant than Postgres' unlogged tables, just because Redis a specific tool optimized for in-memory store. Postgres, OTOH, has layers of abstraction for a simple store and retrieve functionality. Use correct tools for the correct uses.

@DanniDuck 3 месяца назад

@@dreamsofcode *The complexity often does not outweigh the benefit. It's extremely simple and will make everything 100x faster. For example, say you have a bunch of base64 encoded images stored in pg, you can make it so the image (likely ~10 kb each or so), gets stored in memory as it's result format, allowing you to make anything involving images significantly faster. It can make things super fast if you use it right, eg. big queries for a product's info or whatever.

@OneShore 3 месяца назад

@@peppybocan Yeah, the difference is that Redis is very lightweight. If you're looking to throw more RAM & CPU at a DB problem, then Redis starts to make sense. Because 8GB Redis + 8GB Postgres is going to outperform 32GB Postgres in many cases.

@peppybocan 3 месяца назад

not necessarily, depends on the workload. If you have a transaction-heavy processing, there is no way around it. E.g. if you are a paypal and you need strong ACID guarantees you may find yourself in a pickle. Storing payment information in-memory is fine as long as you have resiliency built into the application.@@OneShore

@theblckbird 3 месяца назад

In Rust, you can do the following to convert a Result to an Option: let my_result = action_that_returns_a_result(); // Result let as_option = my_result.ok(); // Option It works the other way around as well: let option = Some("foo"); // Option let as_result = option.ok_or(0); // Result let option = Some("foo"); // Option let as_result = option.ok_or_else(|| 3 * 3 / 9) // Result

@dreamsofcode 3 месяца назад

This is much simpler! Thank you.

@nathanoy_ 3 месяца назад

Awesome write up. I was about to write a similar comment. Now I write this reply to push this one. 👌

@Maxelya 3 месяца назад

I still consider myself lacking experience with Rust, but somehow I knew about these "ok" methods and was about to point it out after watching the vid ^^'.

@kartik180rajesh1 3 месяца назад

If you use the redis as a cloud hosting service isn't it defeating the purpose of the cache? The cache should be ideally as close to your backend service - either in your network or same instance memory

@frazuppi4897 3 месяца назад

amazing video, will checkout aiven for sure

@michaelhenze877 2 месяца назад

Would really like to see a comparison between NvChad and your current NeoVim configs.

@RomanKornev 3 месяца назад

In the Write-Through caching case, what happens when the concurrent cache write takes slightly longer than expected? Now the client assumes that the data was updated, but when reading it back it would be a race condition, and the value might be stale.

@dreamsofcode 3 месяца назад

Your correct, that's one drawback of concurrency as it can introduce a race condition. The only way to solve it would be to lock the cache and pass that lock through to the concurrent task. The other approach is to do write through caching with the cache being the first target, although this can lead to some weird state if the database operation fails. Either way adds complexity!

@tiagocerqueira9459 3 месяца назад

I was thinking something similar, like the DB or the cache fails, you need a way to sync the state again. I think the easiest solution is to spawn 2 tasks, one for the DB and one for the cache and await the results of both and handle those cases. However, the response time is the slowest of the two

@dreamsofcode 3 месяца назад

@@tiagocerqueira9459Even spawning two tasks can be complicated, however, especially if one fails and the other doesn't. You need to then reconcile afterwards.

@tiagocerqueira9459 3 месяца назад

@@dreamsofcode yes, but I guess you always need to wait for the result of both in the handler when you mutate data, you can't just "set and forget" as an optimization

@EduarteBDO 3 месяца назад

I think one solution workflow would be: lock cache key > update database > (update failed > unlock cache) update success > delete cache and let a cache miss happen in the future.

@watzyh 2 месяца назад

I never use redis for caching. It's a database. Other than simple key-value store, i use it for handling time dimension in the program (rate-limiting task & job queue) very useful for webserver which each request run separately. For caching, it's job for webserver like nginx. It's far-far more efficient & performant. I never have any issue with cache invalidation or using custom cache-key. You can control nginx cache programmatically just like redis cache.

@hosamhamdy258 3 месяца назад

great video can you make when to cache or not video too thanks in advance

@rando521 3 месяца назад

so i have a question since i am new to rust and axum the appstate is some amalgamation of arc/mutex and you lock it everytime you want to access db or redis cache wouldnt this just mean you are making the asynchronous runtime semi-synchronous

@dreamsofcode 3 месяца назад

It's a great question. My implementation in the video is naive, mainly due to simplifying the code as much as possible. The requests are still asynchronous but you're correct the lock would prevent concurrent requests both accessing the shared state. An improved implementation would be to use either a RW lock, or have a better abstraction of the state to only lock when needed (rather than at the start of the request)

@rando521 3 месяца назад

thanks kinda new to rust and async

@TheTwober 3 месяца назад

Now imagine you program in Java and all those problems are already solved. :) Just use a SoftReference that will be cleared by the GC if it needs memory, and the attached ReferenceQueue can be (blockingly) polled by a background thread, so your cache gets informed whenever something got removed by the GC. A near perfect cache is nowaydays literally 3 lines of code in Java.

@LauriePoulter 3 месяца назад

any tips for avoiding stale data when dealing with a 3rd party service that can be updated by other actors?

@neelg7057 3 месяца назад

Which font is that in your nvim? :)

@PiesekLeszek90 3 месяца назад

Write-through cache sounds like you just have 2 databases running at the same time, but I assume it's because of the simplicity of the example? I'd imagine you only cache the prepared API response with all it's relations and after applying logic, and not "raw data" as it is in the main database? This doesn't sound too optimal when you update one record that applies to many users, but each user needs it's own cached version?

@Fanaro 3 месяца назад

Please make a video on how you edit your videos!

@vinii2815 3 месяца назад

hey sorry this is out of the topic of the video but will you make a new video about NvChad configuration? their new file structure is very confusing and I haven't seen anyone with an update tutorial for it yet

@dreamsofcode 3 месяца назад

I'll be redoing the neovim content soon! I recommend staying with NVChad 2.0 for the mean time!

@arcadierosca9818 3 месяца назад

Can you create a video on how to make video like that? it's amazing!!!

@Avanta1 3 месяца назад

I'm not very familiar with async Rust, but is there any change of race condition when updating the cache? If a thread that was spawned later acquires the lock before an earlier spawned thread?

@dreamsofcode 3 месяца назад

Youre correct.There absolutely is a chance. A race condition is introduced by making the update concurrent from the response. If you want to ensure 100% consistency then performing the update synchronously would be preferable!

@Avanta1 3 месяца назад

@@dreamsofcode Cool, thanks for replying!

@dreamsofcode 3 месяца назад

@@Avanta1Thanks for asking the question!

@CrypticConsole 3 месяца назад

Why do you need to cache this in Redis? Could you not just use master slave database scaling for read heavy workloads?

@dreamsofcode 3 месяца назад

Read replication is a decent solution in many use cases, especially read heavy as you mentioned. Just like caching, it's a tradeoff so it does depending on what your data model / system looks like.

@IS2511_watcher 3 месяца назад

4:26 `.unwrap_or(None)` can be shortened to `.ok()` for `Result`, more idiomatic too.

@betoharres 3 месяца назад

why did you make the Write Through Cache concurrent? there's a chance of two concurrent requests have a mismatch value returned based on what's in the database; maybe I'm missing something here

@dreamsofcode 3 месяца назад

You're correct. However, even with it being serial, there's no guarantee in a distributed / horizontally scaled system of a race condition not occuring. With a cache, it's almost impossible to guarantee consistency without locking the actual cache itself. In a distributed system, that's going to be even more complex.

@archip8021 3 месяца назад

i have a table of about ~20 items that i need very, very often, and it rarely changes is this a good use case for caching? can a whole table be cached like this?

@dreamsofcode 3 месяца назад

I think for that size, you're likely not going to need caching. Caching is more for when you have slower queries, such as aggregations or hitting an API that has poor performance. Adding in a cache adds in complexity and it's probably not worth the performance gain you might receive.

@arturpendrag0n270 3 месяца назад

Cant you load them at the start of the request and use some singleton or put them in some "global" variable so you wont have to request the unless needed. Even if thats not the case the db usually has caching mechanisms for repeating queries so for such small size of records its probably unnecessary.

@user-oo9el8sx5b 2 месяца назад

what video editing software do you use? it looks like you're on Linux

@Affax 3 месяца назад

Welp, time to move to KeyDB or DragonflyDB, at least they both are redis API compatible haha

@saywaify 3 месяца назад

Can you please share your nvim setup (or at least the colorscheme) ?? It looks so fine

@perz1val 3 месяца назад

Colorscheme looks like catpuccin

@animanaut 3 месяца назад

if you want to enable client side caching there are also etag request/response headers that can be used as well. a whole other topic, but i believe they use hashes to let the backend decide to respond with either a potential big payload over the network or not if the client's hash code looks ok to what is pesent in the server db/cache already (returning http code 304 instead).

@hansenchrisw 3 месяца назад

+1, though If-Modified-Since is a bit simpler and usually sufficient

@hansenchrisw 3 месяца назад

@CesarLP96 search for HTTP conditional requests

@animanaut 3 месяца назад

developer pages from mozilla would be one recommendation from me, also known as mdn

@animanaut 3 месяца назад

@CesarLP96 mozilla developer pages would be one page, just search for etag

@animanaut 3 месяца назад

fyi, i answered multiple times now but yt refuses to show it for some strange reason. not sure you will see this comment as well. one example would be the mozilla developer network

@egemengol306 3 месяца назад

For the life of me I don't understand the need for Redis When I need caching I always reach for in-memory caching libraries right in my codebase, reducing latency with development and deployment complexity at the same time, while staying featureful. If the language is memory hungry in-memory sqlite works really well for most of the cases If I want centralized state I reach for the database itself, Postgres is excellent Under which circumstances Redis would be the first choice? Edit 1: Multiple instances caching for mutable data would be one I suppose

@TR1XT3RZ360 3 месяца назад

can you share your terminal setup.

@SlavomirDanas 3 месяца назад

Woah, wouah, woah! Just 8 seconds into the video and I see inforgraphics with cache layer in the completely wrong spot.

@krateskim4169 3 месяца назад

I would like to know when to cache and when not to please

@Amejonah 3 месяца назад

There is one big question I have for a long time: how do distributed microservices work? Especially how scaling of certain services can be achieved? How buses/message broker play a role in it? You might be the one who can address these questions using simpler terms.

@fadhilinjagi1090 3 месяца назад

What of you deleted the cache entry right before you updated/deleted the record in the DB? Will this prevent the race condition?

@fadhilinjagi1090 3 месяца назад

I think that's optimistic mutation, if I'm not wrong.

@JuanPabloCisneros2207 3 месяца назад

Caching is always tricky. In the lazy loading presented, you can end hitting the dual write problem as postgres is wat slower than redis. If the system needs concurrency it could be a tricky bug to solve i think

@Zutraxi Месяц назад

Don't forget the retry policy for when your concurrent write to the cache fails. What if the api crash as the write is happening. Better use a fault handling outbox pattern. Suddenly caching is slower than accessing the database.

@peppybocan 3 месяца назад

Unless you handle 1000s of concurrent users and they pay you nothing (free users) you don't need to worry about caches. The right DB design with the right sized DB node can handle 1000s of concurrent users. Once you start handling 10k-s of users, then you think about the caching, but at that point, it should be fairly easy to scale the particular parts of your DB, because you *know* what is slow.

@dreamsofcode 3 месяца назад

100% Profiling your queries and using well placed indexes is always a better option. Sometimes it's not possible (such as hitting a remote AP)I, but if you have control of the database then it's always the better option

@parkourbee2 3 месяца назад

Even then, do I really need a cache? Why not just index what needs to be indexed?

@peppybocan 3 месяца назад

yeah absolutely! Those limits are external, and that's when it matters. I think Redis is a viable option in that case. even things like session authentication can be done with a silly in-memory LRU cache and it will get you 90-95% to the goal. But people tend to be very quick to stuff the project with a billion dependencies. @@dreamsofcode

@dreamsofcode 3 месяца назад

@@parkourbee2If you don't have access to the database? I'm thinking more like Remote API's etc where you don't control the data at all. for example, we had an API that hit NIST for CVE's and was incredibly slow, in that case, caching was a good solution.

@luca4479 3 месяца назад

Postgres has built-in caching which is already crazy performant

@fahimferdous1641 3 месяца назад

What would be an example usecase for the random eviction policy?

@I25mI25 3 месяца назад

LRU comes with a small overhead since you have to somehow store/maintain a "list" of which items were last accessed. In many "normal" cases, it is likely that an item that was recently accessed will be accessed again, so keeping the newer/most frequently used ones in cache is worth the overhead. If your access patterns on the other hand are mostly random, keeping track of usage patterns isn't really worth it, so you can just delete any random entry. You might still want to use a cache even in random use cases when the occasional random cache hit might still give a big enough boost/save you money in bandwidth/storage access cost to make the added complexity of a cache worth it.

@dreamsofcode 3 месяца назад

This is a great explanation. As for specific use cases, it's hard to really describe any that would fall into this. But any data / queries that have no discernable pattern, or in a system where the likelihood of needing a key is the same across your data set.

@Cal97g 3 месяца назад

It’s not stale it’s just eventually consistent

@pieter5466 3 месяца назад

8:14 Makes you wonder whether there is *ever* a good use case for "random order"

@M3t4lstorm 3 месяца назад

Note: In the write-through example, if your application crashes/errors/gets killed before the cache update is writen to redis (after the DB write) you will have stale data forever.

@dreamsofcode 3 месяца назад

You mean until the TTL?

@liu-river 3 месяца назад

yeh, but if you do sync, update redis after successful dB write then you sacrifice speed. I guess you can implement some kind of rollback if either fails?

@ebukaume 3 месяца назад

What happens when the spawned task compeletes with an error? It seems we didn't completely solve the stale data problem.

@dreamsofcode 3 месяца назад

Correct, and in a distributed system, this is even more difficult!

@Myrkytyn 22 дня назад

When to cache?

@foreverexpanding 3 месяца назад

Why not update the cache when we update the DB, in that case there would be no need to worry about it being stale

@ShimoriUta77 3 месяца назад

Rust code is so beautiful.

@dreamsofcode 3 месяца назад

😅 It's not known for it's beauty

@PhilfreezeCH 3 месяца назад

Who ever thought caching was simple? Its one of those rare things thats hard on all levels. Its very difficult in hardware development, difficult in software and ridiculously difficult in networking, its just brutal. Plus it always requires a ridiculous amount of benchmarking and verification to make sure you don‘t accidentally degrade performance on certain workloads or even worse, mess up data.

@ivan_adamovich 3 месяца назад

I did not understand one thing: 50 ms is certainly good response time. but for the simplest api written in rust, there is somehow a lot, don't you think? (i use go im projects, so i'm noob in rust)

@illyias 2 месяца назад

You won't need caching in a simple project, your database will be able to handle the load fine.

@nexovec 3 месяца назад

I just realize you can literally ship a product that's just static files and a Postgres server. Curb your stack, please.

@saxtant 3 месяца назад

You do you your hardware is pretty much taking care of this already?

@youtube_user9921 3 месяца назад

Hi. Can you also post tutorial lectures on nix?

@dreamsofcode 3 месяца назад

Absolutely! I'll likely do it on my other channel which is more focused on Linux and FOSS. I've been playing with NixOS more on there

@youtube_user9921 3 месяца назад

Can you tell me which channel it is?

@backupmemories897 3 месяца назад

sometimes adding cache slows it down xD but scale it better xD because whenever u do something u call that cache system.. another step.

@dreamsofcode 3 месяца назад

Absolutely! That's the problem in the case of inserts at first. It improved the performance of the reads on a cache hit, but caused the timings to increase by 66% on a cache miss.

@FinlayDaG33k 3 месяца назад

There is a major issue tho... If your key expires, and suddenly 1K requests come in, you're now hitting the database with all 1K requests and may overload the database anyways. Not exactly ideal.

@dreamsofcode 3 месяца назад

You're correct, this is known as the thundering herd problem. You can solve this by using something such like a single flight mechanism or connection pooling, again it's more complexity though.

@mind.journey 3 месяца назад

I don't know if it's optimal, but what I usually do is never let the key expire, and instead just create a cronjob (or something similar) that periodically refreshes the key with updated data.

@FinlayDaG33k 3 месяца назад

@@mind.journey This works depending on the goals yes. If it's data that you know will be highly saught after by your code, it can definitely work. However, you are now burdened with the task of guessing which data would benefit from it. It can also lead to you having a lot of data in the cache that you may only need once under the full-moon, thus wasting resources in fetching and keeping it cached.

@ordinarygg 3 месяца назад

90% of issues is missed indexes, or crappy backend code that runs 99% and 1% db time. So before you say DB is slow, please benchmark your API and DB independently. Simple 8 core Ryzen machine can handle 300k selects/sec and 60k/inserts per seconds using PostgreSQL. 256 cores and 1 TB of ram will solve a lot of issues in single instance. People don't even reach a level of vertical scaling first, instead starting scaling horizontally, huge mistake for middle-small businesses and startups.

@dreamsofcode 3 месяца назад

Yep, I agree with you (I believe I stated something similar at the end). There are certain use cases where caching applies, but in general, optimizing your database queries is the correct approach. There is a case for horizontal scaling over vertical still, especially wrt availability. But even then you can use read replication to improve that.

@kriffos 3 месяца назад

If you want a really fast cache, it is a good idea to scale the cache together with your application and have no http request to the cache. Most of the time spent to get the data is probably http overhead. I think cache as micro service is - most of the time - a bad idea.

@Ca1vema 3 месяца назад

Dunno what you're talking about, to add cache all I need is to put 2 lines in framework settings 🙃

@perz1val 3 месяца назад

Looking at the comments I think you should've used a request that queries multiple tables of normalized data into a single object. Like /user/2/permissions is: user + user_role + role_permission + permission (list of permission names). Then the benefits are clear. Using cache to store a SELECT * FROM table; is a bad example.

@dreamsofcode 3 месяца назад

Yeah, that's fair. I wanted to keep the interface as simple as possible so as not to distract from the caching itself. My original setup was doing a string search across 10m rows, but then it added more complexity to the examples (and at that point and index is still a better solution).

@CottidaeSEA 3 месяца назад

Cache is all fun and games until the cache is automatically invalidated due to a timer and everyone hits the same slow query at the same time.

@dreamsofcode 3 месяца назад

This is a good one! The thundering herd problem. There are ways to solve it, using something such as single flight, although again it adds more complexity. Much harder to solve across in a distributed system. I'll probably do a video on it as a few people have mentioned it!

@CottidaeSEA 3 месяца назад

@@dreamsofcode Last time I had that issue, I solved it by forcibly fetching and caching with cron. A bit of a hacky and antipattern way of solving it, but it works really well.

@rosiepone 3 месяца назад

the number one question you should be asking yourself when setting up a data layer is "does it matter HOW my data is stored?" if you discover that it doesn't matter at all, a flat json file is a decent option. If you discover that you need to connect several devices together, then a fast but scalable network database like postrgres or ravendb will work JUST file. If you discover that you are needing to request the data far more frequently than your systems are able to handle, THEN you need a cache.

@hansenchrisw 3 месяца назад

+1, engineers often over complicate things. I think it was Dijkstra who said premature optimization is the root of all evil.

@shady4tv 3 месяца назад

ironic that a video about redis comes out just before everyone drops it for going closed source.

@dreamsofcode 3 месяца назад

Bad timing! Although tbf, this can apply to any caching solution. + I get to review all of the forks that are coming

@shady4tv 3 месяца назад

@@dreamsofcode Honestly the timing is perfect! Redis is hot in the news cycle right now and you're right - this video isn't really about 'Redis' persay. But it's actually a great introduction to people who are uninformed about the software and want to get up to speed on all that is happening with it right now. I hope you get hella views from this bud! :)

@HUEHUEUHEPony 3 месяца назад

I mean it is only closed source if you are a big company

@mrmelon54 2 месяца назад

@@HUEHUEUHEPony no? The new licensing doesn't fall under the definition of open source, and isn't accepted by the open source initiative.

@its_maalik 2 месяца назад

Adding a cache should be the last resort to achieving good performance. Majority of applications will do just fine without a cache if they nail the data modeling and query optimizations.

@Anshucodes 3 месяца назад

Make a rust roadmap video Or make some tutorial 😂

@temie933 3 месяца назад

Can you create a how to arch video? Showing how you configured arch Linux.

@lemonking4076 3 месяца назад

Nice video! But I don't understand why would a dev torture themself with rust

@dreamsofcode 3 месяца назад

🤣🤣🤣😭😭😭

@lemonking4076 3 месяца назад

@@dreamsofcodeit's just way too verbose and not easily readable 😂🙀 I hope this comment doesn't turn into a flamewar!!!

@evccyr 3 месяца назад

I will do no push-ups for every like this comment gets. I'm sore from the last time.

@foziezzz1250 3 месяца назад

Would like to join you in this

@martin4ata933 2 месяца назад

LETS GOO

@sieunpark2160 3 месяца назад

first place!

@ariseyhun2085 3 месяца назад

@itsme3217 3 месяца назад

Is this your life achievement ?

@pythagoran 3 месяца назад

Congratulations and/or I'm sorry to hear that

@sieunpark2160 3 месяца назад

@itsme3217 yeah my mom is proud of me 😁

@buddy.abc123 3 месяца назад

Rust syntax 😭😭😭🤢🤢🤢🤢

@dreamsofcode 3 месяца назад

I'm with ya. I think I'm gonna use Go more for demonstrating anything non language specific in the future!

@bavidlynx3409 3 месяца назад

The comments and the video made me realise that caching is rather unnecessary and creates a lot of overhead and issues so imma stay away from it

@TheHTMLCode 3 месяца назад

I don’t think that’s necessarily the best decision, if you don’t cache you will incur performance issues under certain circumstances. The example illustrated in this video is a very simple one which could have been solved by efficiently indexing your database, but as you scale or encounter more complex problems you may want to consider caching for latency sensitive functionality. At work we have a workflow that requires our operators to pick orders in a warehouse, fetching the pick list (all the instructions to carry out the picking of an order) takes around 500ms to generate. The pick list reflects the entire state of the current pick journey and uses a cache write through strategy to update the cached pick list after every scan in the warehouse. Without a cache, the front end would need to rebuild the list from the database every time it retrieved the next instruction, 500ms after completing a stop and fetching the next stop would suck, fetching from cache and having a result in 30ms is far better. The tradeoff here is maintaining the complexity of the cache in order to achieve the performance SLO (service level objective) we promised to our consumer (warehouse staff). For simple applications you may be able to keep away from caching but I’d definitely learn and keep it as a tool in your toolbox, I’m sure sometime in your career it’ll be useful :) hope that helps!

@Amejonah 3 месяца назад

I use caching (through postgres, I should really switch to redis) currently to make values live after restart of the application, as requesting the data takes a lot of time and consumes rate limit tokens.

@realbootybabe 3 месяца назад

I like your videos so much! Thanks a lot 😄 How do you create your videos? What tools do you use? Maybe you want to create a video about that! 😎 thanks!

@dreamsofcode 3 месяца назад

Thank you! Yeah, I will be doing a video on my process this year I hope. I need to find the time to do so! Will drop a community post when I do :)