'Why does Postgres still do a Bitmap Heap Scan when a covering index is used?
The table looks something like this:
CREATE TABLE "audit_log" (
"id" int4 NOT NULL DEFAULT nextval('audit_log_id_seq'::regclass),
"entity" varchar(50) COLLATE "public"."ci",
"updated" timestamp(6) NOT NULL,
"transaction_id" uuid,
CONSTRAINT "PK_audit_log" PRIMARY KEY ("id")
);
It contains millions of row.
I tried adding an index on one column like this:
CREATE INDEX "testing" ON "audit_log" USING btree (
"entity" COLLATE "public"."ci" "pg_catalog"."text_ops" ASC NULLS LAST
);
Then ran the following query over both the indexed column, and the primary key:
EXPLAIN ANALYZE SELECT entity, id FROM audit_log WHERE entity = 'abcd'
As I expected, the query plan uses both a Bitmap Index Scan (to find the 'entity' column, presumably) and a Bitmap Heap Scan (to retrieve the 'id' column, I assume):
Gather (cost=2640.10..260915.23 rows=87166 width=122) (actual time=2.828..3.764 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on audit_log (cost=1640.10..251198.63 rows=36319 width=122) (actual time=0.061..0.062 rows=0 loops=3)
Recheck Cond: ((entity)::text = '1234'::text)
-> Bitmap Index Scan on testing (cost=0.00..1618.31 rows=87166 width=0) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((entity)::text = '1234'::text)
Next I added an INCLUDE column to the index in order to make it cover the above query:
DROP INDEX testing
CREATE INDEX testing ON audit_log USING btree (
"entity" COLLATE "public"."ci" "pg_catalog"."text_ops" ASC NULLS LAST
)
INCLUDE
(
"id"
)
Then I reran my query, but it still does the Bitmap Heap Scan:
Gather (cost=2964.10..261239.23 rows=87166 width=122) (actual time=2.711..3.570 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on audit_log (cost=1964.10..251522.63 rows=36319 width=122) (actual time=0.062..0.062 rows=0 loops=3)
Recheck Cond: ((entity)::text = '1234'::text)
-> Bitmap Index Scan on testing (cost=0.00..1942.31 rows=87166 width=0) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: ((entity)::text = '1234'::text)
Why is that?
Solution 1:[1]
PostgreSQL implements row versioning using a concept called visibility. Each query knows which version of a row it can see.
Now that visibility information is stored in the table row, but not in the index entry, so that table has to be visited just to test if the row is visible or not.
Because of that, every bitmap index scan needs a bitmap heap scan.
To overcome the unfortunate property, PostgreSQL has introduced the visibility map, a data structure that stores for each 8kB-block of the table if all rows in that block are visible to everybody. If that is the case, looking up the table row can be skipped. This is only possible for a regular index scan, not a bitmap index scan.
That visibility map is maintained by VACUUM
. So run VACUUM
on the table, then you may get an index only scan on the table.
If that alone is not enough, you could try CLUSTER
to rewrite the table in index order.
Some detail information on how PostgreSQL estimates the cost of an index scan. The following code is from cost_index
in src/backend/optimizer/path/costsize.c
:
/*----------
[...]
* If it's an index-only scan, then we will not need to fetch any heap
* pages for which the visibility map shows all tuples are visible.
* Hence, reduce the estimated number of heap fetches accordingly.
* We use the measured fraction of the entire heap that is all-visible,
* which might not be particularly relevant to the subset of the heap
* that this query will fetch; but it's not clear how to do better.
*----------
*/
[...]
if (indexonly)
pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
allvisfrac
is calculated using pg_class.relallvisible
, which holds an estimate for the number of all-visible pages in the table, and pg_class.relpages
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |