'Correct way to query MySQL table with 2m+ rows based on criteria from a separate table
Terrible title, sorry for not being able to concisely articulate this question.
- I have a MySQL table (Table name: users) with 2m+ users (rows) in it. Each one has a score for that user. (Columns: userid, name, score)
- I need to apply 'interests' to each user, so I have created an 'interests' table with columns (Columns: userid, interest). A new row is created each time an interest is assigned to a user.
- I then need to select 50 users where their interest = 'surfing' and their score is between 10,000 and 50,000
There might be 500,000 users with a score in that range.
My query:
SELECT
a.userid,
a.interest,
b.name,
b.score
FROM interests AS a LEFT JOIN (SELECT
userid,
name,
score
FROM users
WHERE score > 10000 AND score < 50000) AS b ON a.userid = b.userid
WHERE a.interest = 'surfing'
ORDER BY b.score DESC
LIMIT 50
So I think my above query will work, but I'm not sure I'm going about it in an efficient way. My understanding is that it's essentially selecting all interests rows where the interest = 'surfing' (this might be 50,000 rows) then performing a JOIN on the user table which itself might return 500,000 rows.
Solution 1:[1]
select
i.userid,
i.interest,
u.name,
u.score
from
interests i
inner join users u on
i.userid = u.userid
where
u.score between 1000 and 50000 and
i.interest = 'surfing'
order by
u.score desc
limit 50
Remember to add the following indexes:
- INTERESTS: userid, interest
- USERS: userid, score
Solution 2:[2]
You perhaps do not need a derived query for join and do as
select
a.userid,
a.interest,
b.name,
b.score
from interests a
LEFT JOIN users b on b.userid = a.userid
where
b.score > 10000 AND b.score < 50000
and a.interest = 'surfing'
ORDER BY b.score DESC
LIMIT 50
And adding some indexes would make it faster, if userid is a primary key on user table then you do not need to re-index it on the same table.
alter table interests add index user_inter_idx(interest,userid);
alter table users add index user_score_idx(score);
NOTE : Make sure to take a backup of the tables before applying indexes on them.
You can also check the query health using
explain select ...
This will provide you an idea how the optimizer will work on the query.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
