I use both techniques, but most of the time I use limit/offset because the need for paginating to a specific page, I use cursor technique for paginating notifications and tasks and similar things. Thanks Aaron! that was an amazing video as usual.
I don't even use SQL in my day to day (not in our tech stack atm), but you explain so many tangential/incidental concepts so well that they're still extremely useful to me. And very entertaining! @@PlanetScale
Im about to add pagination to an old report, and at first I was going with limit offset, but after watching this video I know it makes a lot more sense to just use cursor. Thanks for such an organized and well thought out video.🎉🎉
Paginate vs Paginate is the next great battle in tech. Great deep dive on the differences between the two paginate (paginate?) methods that give a really good framework to help decide when you might reach for each one. A++++ would buy again.
wait, 0:35 .... are you actually using youtube in light mode?? like seriously!!, do you like it?? ooh i see that's why you have glasses! 😂😂, that was a terrible one
Thanks for this video. Really helpful. I was wondering: 1. At what point does offset pagination become unfeasible in terms of perf? 1m records, 2m records? Usually though you would surely first reduce the initial result set scoped to the user (but not for e.g an internal admin dashboard). 2. With cursors. This operates under the assumption that ID is an incrementing integer right? If an application uses uuid as primary keys, would that work too? Or do you need another unique auto increment column to have that precision? 3. Pagination backwards with cursors: I would assume the previous page token would need to either be kept track of or is there a way to reference only the first record in the set as the ‘end’ point of the previous cursor and limit it to ten before that?
If anyone is wondering why encode the last id to base64 here is the reason. Javascript can handle number to a certain limit after that it wont work so the solution to that is just to encode your id and send it as a string and decode it to a number on client. If you have a cursor which wont grow more that than the maximum available number then you can directly use numbers instead of encoding it and adding a overhead. This was started by facebook ( now meta) for the very same reason
Also, there is a security catch here! Implementing this carelessly without enough validation of the base64 encoded value leaves ways for SQL Injection.
@@DomatoZauseisn't there also a security issue for information leakage. Anyone can decode the base64 string and see at least some of your (probably not very interesting) column names. Might be worth encrypting things for bonus security points.
@@chrishwheeler my thoughts on this is that if you have taken necessary security measures for your application you don't need to worry here. Base64 was used for sending data not encrypting it and increasing security no. It's just a wrapper that is convinent.
Great article, I've only used offset pagination, as a web developer I can't see much use cases of just prev/next navigation, unless you have 2 or 3 pages, otherwise you want the user to have the control to navigate a bit close or far or even to the end, and this includes specific pages navigation, left and right, start and end. Otherwise for cursor page navigation the infinite scrolling seems a much more real world option and for that would make sense to use it.
I find that limit/offset is only ever a requirement when the PO or PM or whoever is designing the UI blindly assumes that clicking a list of page numbers is the simplest solution rather than specifying the requirements and leaving the implementation up to the developer. If you tell them that a virtualised list/infinite scrolling is just as easy and works much better they are usually more than happy to use it, they just assumed it would be complicated or didn't even think of it. Most of the time when the list is very long what you really need is just searching and filtering, and then you can use the continuation token to page the result as the user scrolls. If there are still a lot of results then the user can tighten their search. No user ever wants to go straight to page 27.
Could you please explain what do you mean by "Scrolling and Throwing away that offset data". I believe you are talking at memory level things. It would be so helpful if you explain and show how pagination works at disk level,how pagination queries select data from disk partitiones stored in Binary tree format.
8:25 my question that i figured out the answer to: why does a cursor need to include the extra/useless data of the name? id is unique and is sortable, so i would think all you need to know for the cursor is that it's after that id (if id is in the ordering) since the rest of the ordering is handled by the rest of the SQL statement. my answer: it's possible that there's someone named Bobby that has the id 20000, which is lower than the Aaron's id of 25995, so you only need to care about the id when the name is the same, otherwise anything after the name "Aaron" is fair game
When using cursor and multiple columns, I dont understand why you'd need to query both the first_name and the id. Since the id guarantees deterministic ordering, looking for id > 25995 is enough to get the same order you had before (assuming same order by, of course), first_name = Aaron or first_name > Aaron seems redundant
Basically any column or combination of columns that is guaranteed to be unique is deterministic. The Id was just used as an example bcs it's guaranteed to be unique... well it better be 😅
I use both but don't what they called. But don't know performance difference. Still thinking best way to get total count() should i run a another query or something else.
Getting total count on a huge table is tough no matter what. But yes, you'd have to run an entirely separate query to get the total records and therefore total number of pages.
But the cursor only works, if the ID is an integer. How about UUID as primary key? This way even offset and cursors are unusable, if you delete data. What’s the best practice for this usecase?
How would u implement cursor pagination if your id‘s are uuid? My first thought is using an createdAt Column instead of the id. Is that a good practice or is there a better solution?
You can just use the uuid anywhere I write id. Works the same way! Doesn't matter if the id is an int or string, as long as it's unique. Adding the additional id (or uuid) is only there to make the sort stable.
Great video!!! I recently found a way to greatly simplify the cursor pagination query for sorting by two or more columns. The trick is to use tuples for comparison. I'm curious if there are any trade-offs that I should be aware of
If you want to use cursors but also support jumping to pages, couldn't you conditionally use cursors or offsets? You could use a cursor by default, but if a user wants to jump to page x, then you can drop down to offsets. Then use an offset if the user navigates to the next page (or previous page if you provide both a next and previous cursor).
Awesome explanation, thanks. But in the limit-offset method, you mention that all records before the offset are discarded, hence making it less performant. I'm not sure why the database can't just skip over those records in the first place, similar to how it skips over records in cursor pagination. Could you please explain that part? Thanks
I think that it's able to search by conditions much faster because it could perform a binary search, which has a reduced complexity of log(n). When skipping through records this is not possible because it needs to count how many it's skipping, which means that it reads n records.
I have 1 question, what if my where clause is super complex and I need to reduce my data universe a lot , cursor will need to reduce the universe and then apply the where for the cursor like offset , this is not more complex or equal ?
i wish we had materialize cte inside a function that has an expiry time. so we can have more complex query.. instead we pass in a refresh key to the function to request between new record or retrieve from cache sort of like session tokens
Interesting explanation, thanks for this! But, what if there are no incremental ids in the db? How would cursor based pagination work if the primary key is an uuid and there's no serial number available?
@@codingbyte4529 true, but the explanation relied heavily on the sequential nature of the id column: 'id>2500'. This behavior is not replicated with uuids.
I learned this some years ago when I had to process all items in a table with many millions of rows. I used LIMIT 1000 and OFFSET (Laravel chunks). In the beginning everything was fine and fast, but as time goes on, the SELECT query became slower and slower, and the DB load higher and higher. So I cancelled the processing job to investigate the problem and changed the query to something like WHERE id >= i*1000 AND id < (i+1)*1000 and every query was fast again.
Okay, I will say it, order by and limit are a solution for noobs, the cleanest solution has been window functions for now 4 years if using MySQL, 6 years using MariaDB, and for more than a decade with PostgreSQL, Oracle, and MS SQL Server.
You would need to send out a token to the frontend that represents the first item in the page. If you look at Stripe's api for example, they usually have a next_page and prev_page token. Exact same idea as the video, just with the first record instead of the last!
If you're paging forward and a record is inserted behind the cursor, it won't throw off further pages, but you will have missed that record. Paging backwards you will see it. No method is totally resistant to shifting records, but cursor is more resilient
There's a third way - to maintain an 'index' table that references records in the original table. Unlike original table it must have no 'holes' -- ids must be strictly sequential. This gives you the flexibility to quickly jump to the page of interest using BETWEEN index1 and index2. This solution is not perfect and acts much like the limit/offset solution, but it's much faster to access but harder and slower to update and maintain. The advantages are that it's much faster than limit and offset while letting you access random pages. The disadvantages are that it is still prone to 'drifting' and harder to implement.
That's an interesting technique for sure. That reminds me of the deferred join technique where you paginate a subquery of IDs only, and then join those IDs back to the table to get the full rows. Kinda similar, but much more flexible!
I'm not sure what kind of benefit this has over just using an index and an offset for your cursor. id > page * itemsPerPage. As long as you know how your index is sorted it shouldn't be a problem, if i'm not missing something.
@@disinfect777 Regular ids usually have 'holes' because an item can be deleted, disabled etc. For this reason you cannot rely on your surrogate id for direc pagination (id = pagenum × pagesize) and you need additional 'rank' instead which gives you the guarantee that it is always sequential and without holes. Once you have this guarantee, you can retrieve the required records using the BETWEEN opearator which is very fast - faster than offset and cursor (I think).
@@sergeibatiuk3468 Ok yeah i see your point if you want to jump to specific pages. I do think it's better to just have a next/prev or even infinite scroll. I can't see many using that feature.
Well I think by the time a table gets so big that the performance of offset will be a major problem, it will also be a major pain to not be able to jump to page one gazillion in an instant?
@@Frexuz adding the ID (or uuid) is not semantic, it only provides determinism so it doesn't matter if it's sequential. We're merely adding it to provide a stable sort
Hmm I'm not sure how directly addressable pages via cursor could work, except by manually paging through one by one via cursor. Which kind of defeats the purpose!
@@PlanetScale you can get difference between current page and target page. And make separate/sub query for only id and add it to your main query. Sure it's kinda like going through all pages in between at once, but with key-only selection it should be performant enough. And it's way better than force user to scroll n pages manually :) For going backward - you can just reverse order of your id query and id condition. That way you can still get all benefits of cursor approach and also allow fast records travel. Downsides - pages will not be the same for everyone and more complicated implementation.
Always implemented cursor pagination manually. Going backwards is a pain! But doable, and you can always "fake" the page number on the client side. It's purely cosmetic Edit: skipping pages too, as mentioned later in the video. You can totally do it, I would just rather not haha. Always a relative jump with respect to the current cursor. And often times you still need to fallback on offset/limit if the user provides no cursor, or only a page index