'Postgresql max query on big indexed table has slow performance
I have a table inside my Postgresql database, called consumer_actions. It contains all the actions done by consumers registered in my app. At the moment, this table has ~ 500 million records. What i'm trying to do is to get the maximum id, based on the system that the action came from.
The definition of the table is:
CREATE TABLE public.consumer_actions (
id int4 NOT NULL,
system_id int4 NOT NULL,
consumer_id int4 NOT NULL,
action_id int4 NOT NULL,
payload_json jsonb NULL,
external_system_date timestamptz NULL,
local_system_date timestamptz NULL,
CONSTRAINT consumer_actions_pkey PRIMARY KEY (id, system_id)
);
CREATE INDEX consumer_actions_ext_date ON public.consumer_actions USING btree (external_system_date);
CREATE INDEX consumer_actions_system_consumer_id ON public.consumer_actions USING btree (system_id, consumer_id);
when i'm trying
select max(id) from consumer_actions where system_id = 1
it takes less than one second, but if i try to use the same index (consumer_actions_system_consumer_id) to get the max(id) by system_id = 2, it takes more than an hour.
select max(id) from consumer_actions where system_id = 2
I have also checked the query planner, is looks similar for both queries; i also rerun vacuum analyze on the table and a reindex. Neither of them helped. Any idea what i can do to improve the second query time?
Here are the query planners for both tables, and the size at the moment of this table:
explain analyze
select max(id) from consumer_actions where system_id = 1;
Result (cost=1.49..1.50 rows=1 width=4) (actual time=0.062..0.063 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..1.49 rows=1 width=4) (actual time=0.057..0.057 rows=1 loops=1)
-> Index Only Scan Backward using consumer_actions_pkey on consumer_actions ca (cost=0.57..524024735.49 rows=572451344 width=4) (actual time=0.055..0.055 rows=1 loops=1)
Index Cond: ((id IS NOT NULL) AND (system_id = 1))
Heap Fetches: 1
Planning Time: 0.173 ms
Execution Time: 0.092 ms
explain analyze
select max(id) from consumer_actions where system_id = 2;
Result (cost=6.46..6.47 rows=1 width=4) (actual time=7099484.855..7099484.858 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..6.46 rows=1 width=4) (actual time=7099484.839..7099484.841 rows=1 loops=1)
-> Index Only Scan Backward using consumer_actions_pkey on consumer_actions ca (cost=0.57..20205843.58 rows=3436129 width=4) (actual time=7099484.833..7099484.834 rows=1 loops=1)
Index Cond: ((id IS NOT NULL) AND (system_id = 2))
Heap Fetches: 1
Planning Time: 3.078 ms
Execution Time: 7099484.992 ms
(8 rows)
select count(*) from consumer_actions; --result is 577408504
Solution 1:[1]
I think that you should create an index like this one
CREATE INDEX consumer_actions_system_system_id_id ON public.consumer_actions USING btree (system_id, id);
Solution 2:[2]
Instead of using an aggregation function like max() that has to potentially scan and aggregate large numbers of rows for a table like yours you could get similar results with a query designed to return the fewest rows possible:
SELECT id FROM consumer_actions WHERE system_id = ? ORDER BY id DESC LIMIT 1;
This should still benefit significantly in performance from the existing indices.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Philippe |
| Solution 2 | DigitalHermit |
