'Correct way to query MySQL table with 2m+ rows based on criteria from a separate table

Terrible title, sorry for not being able to concisely articulate this question.

  • I have a MySQL table (Table name: users) with 2m+ users (rows) in it. Each one has a score for that user. (Columns: userid, name, score)
  • I need to apply 'interests' to each user, so I have created an 'interests' table with columns (Columns: userid, interest). A new row is created each time an interest is assigned to a user.
  • I then need to select 50 users where their interest = 'surfing' and their score is between 10,000 and 50,000

There might be 500,000 users with a score in that range.

My query:

SELECT
  a.userid,
  a.interest,
  b.name,
  b.score
FROM interests AS a LEFT JOIN (SELECT
                                 userid,
                                 name,
                                 score
                               FROM users
                               WHERE score > 10000 AND score < 50000) AS b ON a.userid = b.userid
WHERE a.interest = 'surfing'
ORDER BY b.score DESC
LIMIT 50

So I think my above query will work, but I'm not sure I'm going about it in an efficient way. My understanding is that it's essentially selecting all interests rows where the interest = 'surfing' (this might be 50,000 rows) then performing a JOIN on the user table which itself might return 500,000 rows.



Solution 1:[1]

select 
  i.userid, 
  i.interest, 
  u.name, 
  u.score 
from 
  interests i    

  inner join users u on 
    i.userid = u.userid 
where 
  u.score between 1000 and 50000 and 
  i.interest = 'surfing' 
order by 
  u.score desc 
limit 50

Remember to add the following indexes:

  1. INTERESTS: userid, interest
  2. USERS: userid, score

Solution 2:[2]

You perhaps do not need a derived query for join and do as

select
a.userid,
a.interest,
b.name,
b.score
from interests a 
LEFT JOIN users b on b.userid = a.userid
where 
b.score > 10000 AND b.score < 50000
and a.interest = 'surfing'
ORDER BY b.score DESC
LIMIT 50

And adding some indexes would make it faster, if userid is a primary key on user table then you do not need to re-index it on the same table.

alter table interests add index user_inter_idx(interest,userid);
alter table users add index user_score_idx(score);

NOTE : Make sure to take a backup of the tables before applying indexes on them.

You can also check the query health using

explain select ...

This will provide you an idea how the optimizer will work on the query.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2