'Solution for speeding up a slow SELECT DISTINCT query in Postgres
The query is basically:
SELECT DISTINCT "my_table"."foo" from "my_table" WHERE...
Pretending that I'm 100% certain the DISTINCT portion of the query is the reason it runs slowly, I've omitted the rest of the query to avoid confusion, since it is the distinct portion's slowness that I'm primarily concerned with (distinct is always a source of slowness).
The table in question has 2.5 million rows of data. The DISTINCT is needed for purposes not listed here (because I don't want back a modified query, but rather just general information about making distinct queries run faster at the DBMS level, if possible).
How can I make DISTINCT run quicker (using Postgres 9, specifically) without altering the SQL (ie, I can't alter this SQL coming in, but have access to optimize something at the DB level)?
Solution 1:[1]
Oftentimes, you can make such queries run faster by working around the distinct by using a group by instead:
select my_table.foo
from my_table
where [whatever where conditions you want]
group by foo;
Solution 2:[2]
You can try increasing the work_mem setting, depending on the size of Your dataset It can cause switching the query plan to hash aggregates, which are usually faster.
But before setting it too high globally, first read up on it. You can easily blow up Your server, because the max_connections setting acts as a multiplier to this number.
This means that if you were to set work_mem = 128MB and you set max_connections = 100 (the default), you should have more than 12.8GB of RAM. You're essentially telling the server that it can use that much for performing queries (not even considering any other memory use by Postgres or otherwise).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | rogerdpack |
| Solution 2 |
