From 2.5 million row reads to 1 (optimizing my database performance)

Подписаться 222 тыс.

Просмотров 35 тыс.

50% 1

📘 T3 Stack Tutorial: 1017897100294.gumroad.com/l/j...
🤖 SaaS I'm Building: www.icongeneratorai.com/
✂️ Background Cutter: www.backgroundcutter.com/
💬 Discord: / discord
🔔 Newsletter: newsletter.webdevcody.com/
📁 GitHub: github.com/webdevcody
📺 Twitch: / webdevcody
🤖 Website: webdevcody.com
🐦 Twitter: / webdevcody

Опубликовано:

12 авг 2023

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 129

@willmakk 11 месяцев назад

You've become my favorite web dev content creator in no time, no fancy intros and straight to the point with key subjects.

@k4f 11 месяцев назад

Same, I love this style of content. Thanks, Cody

@idk23535 11 месяцев назад

For Real!

@Botshxlo 11 месяцев назад

Please do more performance content, this was really helpful!

@parlor3115 11 месяцев назад

Second

@cas818028 11 месяцев назад

Third

@aymaneu3723 11 месяцев назад

forth

@idk23535 11 месяцев назад

fifth

@behindthescenex 10 месяцев назад

sixth

@amorto3420 11 месяцев назад

I knew about all of these but this was a good watch anyway. Your videos are very informative and not very long so people can actually learn something quickly.

@antoninpire8451 11 месяцев назад

lots of great advices throughout the video, awesome vid !

@notfadeaway6617 5 месяцев назад

performance tuning content is really helpful. thank you!

@avtandilkheladze521 11 месяцев назад

When you index a column in a table (doesn't matter if its thru prisma) it creates a b-tree on a disk (balanced binary search tree), rule of thumb is to not index a lot of your columns in the table and try to know your access patterns on data beforehand (or just access data using primary keys which are already indexed by default). If you start stacking up indexes on everything, your write performance will be terrible overtime and you will lock out tables, because every index has to overwrite several pages in the multiple trees. If you still plan to run analytical queries on your OLTP database (you shouldn't) at least create read replica and index columns that you want to query there, by doing that you will get eventually consistent storage (considering replication lag).

@salman0ansari 11 месяцев назад

this should be pinned

@bencipherx 18 дней назад

Key word is read replica. Which is why you should have an OLAP.

@DomskiPlays 11 месяцев назад

At my job I had to create a data warehouse that stores so much data and gets frequent queries so I had to learn all about clustered and non-clustered indices. Good stuff!

@byt3blitz 11 месяцев назад

Great video! I was just thinking about this topic yesterday. I’m using planet scale as well so the video was very helpful. Also, I found icon gen quite awesome. I purchased credits and was able to gen some kick ass icons for my app. It definitely boosted the user experience and gave it the missing flavor I needed.

@WebDevCody 11 месяцев назад

Nice glad to hear that! What’s your app?

@mahdiboughanmi6781 11 месяцев назад

You are awesome tbh you deserver millions of subscribers !!

@last.journey 11 месяцев назад

I'm happy to see how much people vists your website

@shadyworld1 11 месяцев назад

I’d love to see a series of performance optimizations for SaaS ❤ Amazing one bro 🤜🏽🤛🏻

@hyperprotagonist 11 месяцев назад

I think your audience will grow when you touch on nuance subjects such as this. At least, it drew my attention.

@SyntaxLexx 11 месяцев назад

Direct to the point! 👍

@chrishabgood8900 11 месяцев назад

For indexes, Not just the where but also the selects. Good video

@illiachalyk195 11 месяцев назад

Great video, thank you for doing content that is educative and entertaining at the same time! And I just wanted to add that the downside of creating an index is not that it takes up more space, the space overhead is minor, but the actual downside is that writes will take more time since it has to update multiple indexes instead of just one.

@haziqmasud532 11 месяцев назад

You are right, but that usually depends on the situation or use case. If your database is facing heavy reads then you should definitely add index and your services are facing latency , then adding an index is the right option.

@illiachalyk195 11 месяцев назад

Sure, I mean this is not the reason why you shouldn't use indexes. This is the thing that you have to consider before using them.

@WebDevCody 11 месяцев назад

Great point!

@klapaucius515 11 месяцев назад

Good info!

@navonyt 11 месяцев назад

We need more performance optimization videos!!

@IvanRandomDude 11 месяцев назад

The best way to optimize is to create bad and very inefficient solution in the beginning. Then any improvement you do will be "optimization"

@navonyt 11 месяцев назад

@@IvanRandomDude need to keep that in my mind before I work on new functionalities at my work🤓

@parlor3115 11 месяцев назад

@@navonyt Don't over-optimize, though. Otherwise, they'll kick you out because everything works perfectly.

@navonyt 11 месяцев назад

@@parlor3115 Time to introduce new bugs in the system ;)

@IvanRandomDude 11 месяцев назад

@@navonyt Write such a terrible code that only you can maintain and fix it. Job security

@SteakFisher 11 месяцев назад

planetscale also has built in caching (currently in beta I think) think having the database provider handle caching is a great step considering they can simplify the process down considerably

@martiananomaly 11 месяцев назад

Great video.

@StephenRayner 11 месяцев назад

Excellent

@sheldonfourie5959 11 месяцев назад

Really helpful

@haziqmasud532 11 месяцев назад

The second thing that Cody mentioned regarding counting the number of rows in a table, I am not sure what type of database he is using MySQL or Postgres.But there is schema in the databases by the name of information_schema and that stores the information regarding the whole database, like when you execute SHOW DATABASES or SHOW TABLES; It fetch that information from information_schema. And one of the things that schema stores is information about tables and you can find out the number of records from the information_schema.TABLE table instead of using the COUNT(*) function. That number might not be the latest but can save you lot of execution time. Hope it helps. ☺

@WebDevCody 11 месяцев назад

Good advice! It’s mysql

@tomyamado 11 месяцев назад

If you download the Prisma plugin, I think it suggests in your schema that you are missing an index on the model. Specially if you have relations. Nice video by the way!

@basedad 11 месяцев назад

You have found the secret sauce to high view videos. Showing people stuff like this A) to the point B) well explained C) not the usual everyday content

@huzaifamalik2346 11 месяцев назад

sir need more performance content

@cameronysidron2662 11 месяцев назад

Hey Cody, cool video. Could you talk about in what scenarios someone should reach for prisma over something like a supabase postgresql database? Love your videos!

@WebDevCody 11 месяцев назад

Prisma is an orm, so do you mean planetscale vs supabase? Planetscale provides scaling for very large data out of the box from what I understand, and their dashboard provides good insights (as seen in this video) that could help you performance tune like i showed

@cameronysidron2662 11 месяцев назад

@@WebDevCody oh my bad, I guess my real question then is why should I use an orm 😂 - but yeah what you showed in this video looks pretty powerful! I love the insights on the query level that planetscale provides

@ThomazMartinez 11 месяцев назад

I wish supabase had same reporting UI as supabase damn that looks really nice

@HexapoDD 11 месяцев назад

hi, I agree, prisma is great and table scans are to be avoided. much love

@handikaharianto 11 месяцев назад

Useful video as usual! Could you also cover how to fetch data from database through REST api for admin dashboard that basically contains some graphs and statistics? For context, i have admin dashboard for my FYP project which is built using the MERN stack. The problem is, I dont think i did it properly, especially on the api endpoints and the way i fetch the data from the database using MongoDB aggregation

@WebDevCody 11 месяцев назад

It depends on what data you’re trying to display. Your api would just need to return useful data you could plot in a time series or bar chart, etc. either you fetch all records from your db an aggregate in your code, or you find ways to write aggregate with groupBy statements by time windows probably. It might be easier to just do it at the api layer

@dandogamer 11 месяцев назад

Whilst next does offer a way to do caching I think it's important to keep in mind that its building upon basic http or perhaps external caching principles which are applicable to any server application

@WebDevCody 11 месяцев назад

Right, but next caches at the server level which means all users will get the same data when hitting the api, the browser cache is mainly useful for users who return often to you app

@CanRau 11 месяцев назад

PlanetScale insights is fantastic just recently did the same after realizing I missed 1 important index leading to hundreds of thousands of routed read in every request 😅 note down to 1 row read as well 🎉

@SeanCassiere 11 месяцев назад

Proper schema design (with indexes) and dropping Prisma for Drizzle made a big difference for me. For me since I had a bunch relationship based queries, which Prisma does a horrible job at, is probably what made the difference.

@WebDevCody 11 месяцев назад

I need to try it out, but drizzle looks like I'm just writing sql using a builder pattern which isn't that interesting to me

@SeanCassiere 11 месяцев назад

@@WebDevCody That's a pretty good summary of it. The only thing I'd add in is the type safety brings to your project. Plus it always runs a single query to your db instance whilst Prisma does a bunch of Rust-Graphql-Db emulation for relationships. Your Planetscale instance should be feeling this hit since Prisma's engine makes multiple calls in a transaction to fulfill any requests with relations.

@SeanCassiere 11 месяцев назад

Then again you need to consider the cost of your time, and until you are properly hitting scale you'll probably not need to worry about it.

@zivtamary 11 месяцев назад

@@WebDevCody that was my initial thought too, but I did eventually make the switch to first get rid of the ridiculous heavy prisma client, and second maybe understand SQL deeper instead of prisma black magic and whatever it does behind the scenes and the spaghetti sql code it spits. I think you should really consider it. I'm currently using Kysely as a query builder and prisma just for the db schema with prisma-kysely, and the performance is incomparable!

@monstajoedev 11 месяцев назад

What about adding an in memory cache like redis

@septa6459 11 месяцев назад

Is @@index can be applied to any data type?

@hamm8934 11 месяцев назад

Hate to be that guy, but which theme are you using? Been looking for a new one and I like this one

@WebDevCody 11 месяцев назад

Bearded theme stained blue

@RegalWK 11 месяцев назад

ID by default is index column so you could also look by index if logic in your code allow for this

@WebDevCody 11 месяцев назад

I don't think I know the ID which is why I need to lookup by email. I refactored the column to only hold a list of emails

@RegalWK 11 месяцев назад

@@WebDevCody good for you but remember indexes have pros and cons like everything, they are not free

@gadgetboyplaysmc 3 месяца назад

This might be a stupid question since I lack fundamentals on SQL. But what specifically does an index even do the under the hood that it's able to optimize this? It's pretty cool that literally the solution is just @@index(column) and it magically fixes it.

@sincethatmoment 11 месяцев назад

doesn't getStaticProps run at build time? How does it know to revalidate data if it's already deployed? Or does it get built every hour?

@WebDevCody 11 месяцев назад

When you revalidate it sets up ISG for incremental static generation

@erade4051 11 месяцев назад

Hey Cody I've been thinking of getting back to doing frontend and was wondering what you would recommend in terms of libraries etc for building like a facebook or twitter clone in 2023.

@WebDevCody 11 месяцев назад

since I use react for everything, my biased opinion would be to use next with pages router. Remix seems nice as well. If you don't like react, checkout sveltekit.

@RemotHuman 11 месяцев назад

indexes will also slightly reduce write times so that's another downside, but still worth it usually to increase read times by not reading as many rows

@WebDevCody 11 месяцев назад

Good points!

@yousafwazir3167 11 месяцев назад

The main reason is because prisma reads more than it needs, pure SQL would be to at better and put indexes on your data based on how you query

@parlor3115 11 месяцев назад

What is that Prisma can do that would over-fetch data?

@yousafwazir3167 11 месяцев назад

@@parlor3115 ?

@JS_Jordan 11 месяцев назад

@@parlor3115returning data, upserts and some relational queries will stitch together multiple queries. They try turn mysql to postgresql

@jesse9999999 11 месяцев назад

@@parlor3115 it does relations manually, so rather than doing proper joins it will do discrete fetches for each table and join the data on the way back to you

@fadilnatakusumah195 11 месяцев назад

Do you prefer PlanetScale or Supabase for production app?

@WebDevCody 11 месяцев назад

Probably planetscale at this point

@bhavyajain638 11 месяцев назад

I'm wondering how would one perform the user and icon count caching in Reactjs? Maybe storing the counts in database?

@WebDevCody 11 месяцев назад

I think sql is pretty fast at counting rows, but yeah you could store a computer count and store it in a metadata field somewhere

@bhavyajain638 11 месяцев назад

@@WebDevCody Yes. Thankyou so much for the reply. Didn't expect response this quick... I'm delighted.

@real23lions 11 месяцев назад

Cody, I’ve wondered if there is much difference moving from using ORM to just SQL. what are the pros vs cons in terms of speed of requests and DX?

@WebDevCody 11 месяцев назад

if you write raw sql, you're basically coupling all your code to whatever sql engine you use. an ORM helps abstract away the implementation (postgres, mysql, etc) which means less coupling. IMO, Raw sql gives you the power to write optimized queries at the cost of development speed. I haven't written a raw query in years, and I really don't care to do so again unless I really need it.

@real23lions 11 месяцев назад

@@WebDevCody thanks for the feedback. useful to note if i decide to look further into it. thanks!

@bigblambino 11 месяцев назад

When a column is being referenced using a where clauses in SQL queries there are very few reasons not to have an index on that column, so I think your recommendation that you should wait until your database is in production and doing full table scans to add indexes is generally bad advice.

@jacksonbourne 11 месяцев назад

Since (I'm assuming) the email field is unique, I think it would improve performance even more by using a @@unique(email) index, or at that point just using an @unique attribute directly on the email field. If the identifier is strictly internal, using an `Int` with autoincrement would be even better (obviously don't use it if it's revealed to the user as it would give them an estimate of your user count at the very least, and open you up to scraping if you can view user profiles) Also, instead of using String and @default(cuid()), you'd save a lot of space with a `String @db.Uuid` and using a @default(uuid()) instead if you don't care between uuid/cuid. A uuid is just 128 bits to store, versus storing a cuid in a stringified form which 32 bits per character (at least 768 bits according to the spec's minimum length).

@WebDevCody 11 месяцев назад

Yeah a unique might be a better option

@jacksonbourne 11 месяцев назад

@@WebDevCody Regardless, lots of good info and a great video overall, well done :)

@darylphuah 11 месяцев назад

You have another query that has potential for problems. SELECT count(*) from (subquery that returns rows) The DB Engine is likely optimizing this for you now, but if its get more complicated and it actually has to return rows before counting you will face problems with high row counts. I just don't see why such a simple count would need be done with a subquery to begin with.

@WebDevCody 11 месяцев назад

I’ll look into that one

@bk1507 11 месяцев назад

It is easier to delete indexes than add new ones especially if the table is very large. To make a new index the index tree has to be constructed based on the table, the bigger the table the slower this step takes. Removing an index is just getting rid of the tree much less work. So I would say it is better to be generous when selecting indexes on a table to be safe.

@WebDevCody 11 месяцев назад

Sure, maybe if your dataset is huge, mine isn’t

@kyngcytro 11 месяцев назад

I have a feeling you'll get the same performance boost. If you marked the field as unique in your schema.

@WebDevCody 11 месяцев назад

Yeah probably

@jishnudesarkar 11 месяцев назад

Here's an idea regarding showing user and icon count in your homepage. You can use redis to maintain the counts and every time a users gets added you just do an INCR to that key in redis. This will be way faster the counting all the rows in the DB and also would reduce a DB call.

@vanvanni_nl 11 месяцев назад

You should definitely not index everything that often is queried unless it is a small app. Bigger applications indexing all where clauses will dramatically increase diskspace, and if you do not partition that you can end up in nasty situations

@WebDevCody 11 месяцев назад

Yeah don’t index everything, which is why I said don’t premature optimize

@EvertJunior 11 месяцев назад

I didn’t understand why you dislike app router in this case. You could set it up to revalidate only the counter component instead of the entire page.

@WebDevCody 11 месяцев назад

I have no issues with using the app router to revalidate after some period, my issues with the app router are there are no ways to set it to 0 when navigating between routes

@JS_Jordan 11 месяцев назад

Switch from prisma to drizzle and it comes down an insane amount

@DunckingTest 11 месяцев назад

planet scale requires a cc. any alternatives?

@WebDevCody 11 месяцев назад

Supabase has a free tier db. Although, I didn’t add a cc when signing up for planetscale… maybe they changed their policies

@YeetYeetYe 11 месяцев назад

Yo what theme is that

@WebDevCody 11 месяцев назад

Bearded theme

@jackybanh8105 11 месяцев назад

you forgot that the heap needs to be rebuilt when new emails are inserted

@WebDevCody 11 месяцев назад

correct, adding more indexes causes slower writes, but I'd be curious to see at what point this actually becomes a problem. With my small side project, it'll probably never be an issue.

@buddy.abc123 11 месяцев назад

Good clip for beginners. This is why using ORMs without prior sql knowledge is harmful.

@xpmon 11 месяцев назад

I wouldn't even revalidate every hour maybe once a day is enough

@VincenzoCassaro 11 месяцев назад

isn't another viable solution to just store the count in another table and update it at every new user/icon and just read that one number each time instead of count * ?

@WebDevCody 11 месяцев назад

Yeah, that works as well, but if a user deletes their account or icon, then you need to make sure you update those counts in all the various business logic locations. It’s much easier just to dynamically count it once in a while before exploding code complexity

@chawza8402 11 месяцев назад

well nextjs has the feature and it only takes few lines so I guess this way is much way better than managing a table

@IvanRandomDude 11 месяцев назад

That will slow down your writes. Plus, you will have more writes which costs more. Cache is the best solution if count is not of critical business importance like number of users. Of course, you would not use cached value to show users their money balance for example. But things like numbers of users, numbers of likes etc. does not have to show the latest value.

@VincenzoCassaro 11 месяцев назад

I almost agree with everything said but this would be solved with vercel key/value db so nothing convoluted just a simple value to update and read but I get the point

@chawza8402 11 месяцев назад

@@VincenzoCassaro redis also kv db pair. Or a regular objeck (hash) key value pair also sufficient.

@user-tb4ig7qh9b 11 месяцев назад

The first thing you need to do just learn sql before try doing any backend stuff

@brokula1312 11 месяцев назад

Why having client side caching is better then having the same thing dictated by the server using simple http caching mechanizm? No need to update the client if caching policy changes and 0 code impact.

@WebDevCody 11 месяцев назад

Well for example on the home page, all new users would require a fresh load to the db, vs ISG would give everyone the same static content from a cache (much faster).

@reikooters 11 месяцев назад

I don't know your app so don't mind my ignorance, but it's not clear to me why you even need that Trial table at all. The purpose of a primary key column is to uniquely identify a row - there's no need have a column named "id" in every table just for the sake of it. If it is the case that you always query this table by the email, then you don't need an id column because the email column is what already uniquely identifies the row and that should be the primary key. Or, if you're looking up the trial id to then search a TrialId column in a different table, why not just have the email column on the other table and drop the Trial table altogether? The id column is a string anyway, so it's not like its using a smaller data type for performance benefit or something. That way you can search the other table directly and still add your index on the email column, then you can retrieve the data in 1 query instead of having to do 2 separate queries. I understand that multiple trials could share the same email, but then I again still don't get the point of the table since you could just have the email column in the other place the trial is used. Or if you just need a table to store a list of all the email addresses in your system, that's ok -- for example a list of registered users, but seems like no since you showed the 2nd table with a userid, so I guess you also have a users table -- but if yes then just make the email the only column and the primary key - then you also don't need the index, and as referred to by someone else in the comments, inserting to this table then requires only 1 write instead of 2. Anyway, for the other table which had more columns where you added the index on userid, then yeah that one makes perfect sense and you should do that.

@WebDevCody 11 месяцев назад

I have a trial table to track who has already signed up and received my initial trial credits. Since I allow users to delete their accounts and all data associated with the account, I can’t just check the user table for an email, so I have a separate table to track so that if it’s a new user I’ve never seen before I give them 1 free credit.

@reikooters 11 месяцев назад

@@WebDevCody That makes sense - in that case, it's as I said in my comment i.e. the part I mentioned about the email uniquely identifying the row - the only column you need is the email address which should be your primary key. You don't need the id column as it does nothing and therefore you don't need the index either. Your 2nd table example was a good one though. Insert to your Trial table to not insert duplicate emails will look something along the lines of (MSSQL - may differ in the db you're using) insert into Trial (email) select @email where not exists ( select * from Trial with (updlock, rowlock) where email = @email )

@WebDevCody 11 месяцев назад

@@reikooters yeah that’s correct I can just use a unique keyword on that email column of the table and that’s good enough

@user-tb4ig7qh9b 11 месяцев назад

Learn sql Use explain keyword Play with sql And then say i am working as senior full stack

@user-tb4ig7qh9b 11 месяцев назад

You need to optimize your skills with sql not your query man it is a b c sql databases use relations and joins