'Mysql 5 Percentile rank is too slow
The following Mysql works and compiles correctly to calculate percentile rank in mysql 5, however its really slow. Once i do a few more joins it ends up being about 1.5 seconds. I think I may be doing this in an inefficient way. Any ideas of how i can make this execute faster. (i know mysql 8 can do this better but current db is what it is.
Select ZZ.id,ZZ.symbol,
round(100*(ZZ.count-ZZ.rank1+1)/ZZ.count,0) AS newexample1_prcnt,
round(100*(ZZ2.count-ZZ2.rank2+1)/ZZ2.count,0) AS newexample2_prcnt
FROM
(
Select t.id, t.symbol AS symbol ,max_date as example1date,t.v AS example1, @curRank := @curRank +1 AS rank1,
(Select count(t.symbol) as total
from (SELECT @CountcurRank := 0) rank1, tbl_market_example1 t
inner join
(SELECT t.symbol,MAX(`t`) as max_date, t.v
FROM tbl_market_example1 t
WHERE t.v>0
GROUP BY t.symbol)a
on a.symbol = t.symbol and a.max_date = t.t ORDER BY example1 DESC) AS Count
from (SELECT @curRank := 0) rank1, tbl_market_example1 t
inner join
(SELECT t.id, t.symbol,MAX(`t`) as max_date, t.v
FROM tbl_market_example1 t
WHERE t.v>0
GROUP BY t.symbol) a on a.symbol = t.symbol and a.max_date = t.t ORDER BY example1 DESC) ZZ
LEFT JOIN
(
Select t.symbol AS symbol ,max_date as example2date,t.v AS example2, @curRank2 := @curRank2 + 1 AS rank2,
(Select count(t.symbol) as total
from (SELECT @CountcurRank2 := 0) rank2, tbl_example2 t
inner join
(SELECT t.symbol,MAX(`t`) as max_date, t.v
FROM tbl_example2 t
WHERE t.v>0
GROUP BY t.symbol)a
on a.symbol = t.symbol and a.max_date = t.t ORDER BY example2 DESC) AS Count
from (SELECT @curRank2 := 0) rank2, tbl_example2 t
inner join
(SELECT t.symbol,MAX(`t`) as max_date, t.v
FROM tbl_example2 t
WHERE t.v>0
GROUP BY t.symbol)a
on a.symbol = t.symbol and a.max_date = t.t ORDER BY example2 DESC
) ZZ2
ON ZZ.symbol = ZZ2.symbol
Solution 1:[1]
I don't have the full answer, but here goes.
I think this has a serious problem:
SELECT t.symbol, MAX(`t`) as max_date, t.v
FROM tbl_market_example1 t
WHERE t.v>0
GROUP BY t.symbol
See discussions about "only_full_group_by". My point is that t.v will not necessarily be the value for which max(t) occurred.
Instead of COUNT(x), use COUNT(*) -- unless you actually need to check that x IS NOT NULL.
ORDER BY example1 DESC seems to be irrelevant since it is on a SELECT that returns a single row.
CountcurRank seems to be unused.
Try doing the task using a few TEMPORARY TABLEs. Add to each the appropriate index. (This may not help the speed, but it may help you to better organize the flow.).
If the pair (symbol, t) is unique, consider getting rid of id and change to PRIMARY KEY(symbol, t). (But, since t seems to be a "time", there may be dups?) If not, add this composite index:
INDEX(symbol, t)
MySQL 8.0 and MariaDB 10.3 have PERCENTILE functions and WITH built-in, but they may not be any faster. At some point ih the future of MySQL, your use of @x := ... will be disallowed, thereby forcing you to perform the task in a different way. (I would guess that you are using 5.7 now?)
It may help to provide SHOW CREATE TABLE so we can see the PK and datatypes.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rick James |
