Almost all applications using SQL fall into this common performance trap. Avoid it now!
Every application faces these issues in its early stages: you have to limit the quantity of data sent to a client for its requests to prevent your quality of service to degrade. We find this pattern everywhere:
The delicate part becomes “how we implement this solution?”.
We are going to look here at a simple use case, for pagination with a table object of 1000000 rows. We are considerating 2 different pagination cases: one following the unique primary key id and the other one by tenant name (varchar field with an index).
The easiest solution for pagination, found everywhere on the internet is the usage of
Let’s try this query for the 2 indexed columns (
tenant), for different values of
When we inspect the database optimizer query plan for our pagination query: we notice that the database is using a filesort operation for ordering on the non-indexed column and an index for the tenant and id column. Based on that we can conclude that:
- Using offset on a non-indexed sorted column is a heavy task for the database and must be avoided at all costs.
MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the
OFFSET, even on the primary key, becomes slower over time for high database volume, as the number of rows that need to be loaded in order to be skipped becomes higher and higher.
3. Using offset on an indexed field with a big
key_length is slower than on a small
As we can see,
OFFSET pagination has some drawbacks:
- For a high database volume, the end pages are harder to retrieve than the beginning pages, as the number of rows to load and skip is high.
- For a growing database, it becomes less and less efficient to reach the beginning rows over time. (4,54s or 160s by query becomes unusable for a website with thousands of users)
- When a user is scrolling from page to page, if a row is inserted, depending on the column sort, the user could miss a new item and see a duplicate object (last item of the previous page).
As it is convenient to implement the
OFFSET pagination at the beginning of a project (you don’t have to anticipate query plan, UI sorting use case and index creation), it is common to see an application performance degrade slowly, becoming unusable over time. However, due to the way the
OFFSET is specified, the database has to iterate over all the
OFFSET rows, making this solution really inefficient.
Seek method pagination
As seen previously, even if we try to use an index,
OFFSET make it inefficient as all concerned rows must be loaded, we will have to construct a query that uses the index in order to retrieve our targeted pages.
Let’s find you out how long is this query taking when iterating on the last seen
The result shows us that we won many orders of magnitude in our use case, making the response query time less dependent on the database size. However, when moving to the Seek pagination method, some changes are notable:
- We can not longer rely on the page number UI behavior, as we now rely on a token to give back to the application to get the next page(the last id of the previous page).
- We have to think about what column is going to be used for pagination as we will need the corresponding index.
- If we want to order by a non-unique column, we have to create multi-column indexes (using the primary key as one of the columns) and adapt our query to navigate following our index:
This solution makes pagination really efficient but increase the development complexity: we need to think about the pagination and the allowed sorting columns in our database schema.
As usual in software engineering performance is a tradeoff.
We have seen here how to banish using the
OFFSET pagination in our SQL database when performance is critical.
In exchange for a bit of complexity and a loss of feature, the seek method provides one of the fastest SQL-vendor agnostic ways of paginating your data.