'MySQL group by & order by & limit optimization
For example, I have book_buyer table with the following inside:
CREATE TABLE `book_buyer` (
`buyer_id` bigint NOT NULL,
`book_id` bigint NOT NULL,
`description varchar(255),
`updated_at` datetime,
PRIMARY KEY (buyer_id, book_id)
);
Most of the time I perform all operations on the table using WHERE buyer_id = ? AND book_id = ?, they are unique and the prime index on them suits me, but in some cases I also just need to get a list of all book_ids. Then I resort to this query:
SELECT `book_id`
FROM `book_buyer`
GROUP BY `book_id`
ORDER BY `updated_at` DESC
LIMIT 15;
I also decided to add an index on updated_at to limit the sampling to 15 elements of this table:
CREATE INDEX updated_at_ids ON book_buyer(book_id);
But along with using GROUP BY, the explain command of the latest select query shows me that I'm still sampling ALL the fields in the book_buyer table - which I want to avoid. I tried adding a complex index on book_id & updated_at - didn't help, it still looks at all rows of the table. At the same time, if I remove GROUP BY from the select query, explain shows me that I have reached a constant of 15 units - exactly what my limit sets.
How can I add optimization to GROUP BY paired with ORDER BY & LIMIT to make it const/almost fixed?
Solution 1:[1]
Your query must do 2 sorts before delivering only 15 rows.
Technically, the GROUP BY book_id is invalid. Which updated_at do you want for each book_id? There can clearly be multiple rows for each book_id.
What does updated_at mean? Perhaps it is when the row in this many:many mapping table was updated?
Perhaps you really wanted
SELECT `book_id`
FROM `book_buyer`
ORDER BY `updated_at` DESC, `book_id` DESC
LIMIT 15;
That would benefit from
INDEX(updated_at, book_id)
Note that the two columns in the INDEX are both in the same direction (DESC) and in the same order (updated_at first)to match theORDER BY. Both of these are necessary for the INDEX` to be used. (There is a minor improvement in MySQL 8.0.)
In fact, instead of reading the entire table and doing two sorts (or maybe just one), my Select will read 15 rows from the index and no rows from the table itself. Much faster.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rick James |
