'anyone can explain why mysql does not use index as expected

MYSQL 5.7

EXPLAIN
SELECT biz.id as id, AVG(case WHEN req.deploy_date_id > 0 THEN 
req.dev_cost_time_ex_holiday ELSE null END) as test
FROM t_fact_fit_req as req, t_dim_business as biz, t_dim_deploy_date as d
WHERE req.biz_id = biz.id
AND req.deploy_date_id = d.id
AND biz.id in (10)
AND d.id BETWEEN 20210605 AND 20210611
GROUP BY biz.id

enter image description here there is index idx_biz_date_req_type (biz_id, deploy_date_id, req_type) on req, why it is not been used ?



Solution 1:[1]

Aside from being an auto-generated query, the biz and d tables are not even required as the biz_id and deploy_date_id are part of your req table and dont require the additional correlation to the other tables as you are not getting anything from them. This simplifies down to below query.

SELECT 
        req.biz_id id, 
        AVG(case WHEN req.deploy_date_id > 0 
                THEN req.dev_cost_time_ex_holiday 
                ELSE null END) as test
    FROM 
        t_fact_fit_req req
    where
            req.biz_id = 10
        AND req.deploy_date_id BETWEEN 20210605 AND 20210611

As for indexes, you mentioned a date field in your req table, but your WHERE clause is looking at your d table

Due to transitive association, if A = B and B = C, then A = C. Since you have respective A/B and B/C values within the B common, and that is where data is coming from, you dont need A or C. your Req table has all you need for the query.

So, in this scenario, if you change your date test from the d.id date criteria to the req.deploy_date_id criteria, the index SHOULD get used. But as you mentioned about the query being generated, I would have to look into the logic of how THAT is being built.

Solution 2:[2]

Because it's no longer 1992, we'd normally write that query this way:

SELECT b.id 
     , AVG(CASE WHEN r.deploy_date_id > 0 THEN 
r.dev_cost_time_ex_holiday ELSE NULL END)  test
  FROM t_fact_fit_req r
  JOIN t_dim_business b
    ON b.id =  r.biz_id 
  JOIN t_dim_deploy_date d
    ON d.id = r.deploy_date_id 
 WHERE b.id IN (10)
   AND d.id BETWEEN 20210605 AND 20210611
 GROUP 
    BY b.id

While this does nothing for performance, I find it easier to read.

As to the problem at hand, wouldn't this be more efficient?

AVG(r.dev_cost_time_ex_holiday)  test
  ...
 WHERE r.deploy_date_id > 0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Strawberry