'Optimizer is not using index but the query is faster
I have this query and a cluster B-tree index on l_shipdate:
select
sum(l_extendedprice*l_discount) as revenue
from
lineitem
where
l_shipdate>='01/01/1994'
and
l_shipdate<'01/01/1995'
and
l_discount between 0.04 and 0.07
and
l_quantity<24
and if I EXPLAIN ANALYZE it I get
"Finalize Aggregate (cost=1716219.02..1716219.03 rows=1 width=32) (actual time=9992.581..9992.976 rows=1 loops=1)"
" -> Gather (cost=1716218.80..1716219.01 rows=2 width=32) (actual time=9992.490..9992.970 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=1715218.80..1715218.81 rows=1 width=32) (actual time=9989.613..9989.614 rows=1 loops=3)"
" -> Parallel Seq Scan on lineitem (cost=0.00..1712504.18 rows=542925 width=12) (actual time=0.711..9931.443 rows=443110 loops=3)"
" Filter: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01'::date) AND (l_discount >= 0.04) AND (l_discount <= 0.07) AND (l_quantity < 24))"
" Rows Removed by Filter: 17050621"
"Planning Time: 0.127 ms"
"Execution Time: 9993.006 ms"
If I drop the clustered B-tree index from l_shipdate I get this plan:
"Finalize Aggregate (cost=1716219.02..1716219.03 rows=1 width=32) (actual time=9621.221..9621.632 rows=1 loops=1)"
" -> Gather (cost=1716218.80..1716219.01 rows=2 width=32) (actual time=9621.122..9621.626 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=1715218.80..1715218.81 rows=1 width=32) (actual time=9616.306..9616.307 rows=1 loops=3)"
" -> Parallel Seq Scan on lineitem (cost=0.00..1712504.18 rows=542925 width=12) (actual time=0.669..9556.219 rows=443110 loops=3)"
" Filter: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01'::date) AND (l_discount >= 0.04) AND (l_discount <= 0.07) AND (l_quantity < 24))"
" Rows Removed by Filter: 17050621"
"Planning Time: 0.750 ms"
"Execution Time: 9621.674 ms"
I noticed that the two solutions take the same time (around 10 seconds) and the plan is identical. This means the optimizer is not using the clustered b-tree index on l_shipdate on the first scenario?
EDIT
If I set enable_seqscan = OFF and re-run the query with EXPLAIN ANALYZE I get:
"Finalize Aggregate (cost=1811005.28..1811005.29 rows=1 width=32) (actual time=34118.857..34120.450 rows=1 loops=1)"
" -> Gather (cost=1811005.06..1811005.27 rows=2 width=32) (actual time=34118.442..34120.436 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=1810005.06..1810005.07 rows=1 width=32) (actual time=34110.199..34110.199 rows=1 loops=3)"
" -> Parallel Bitmap Heap Scan on lineitem (cost=106009.94..1807290.43 rows=542925 width=12) (actual time=243.400..34016.240 rows=443110 loops=3)"
" Recheck Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01'::date))"
" Rows Removed by Index Recheck: 12082549"
" Filter: ((l_discount >= 0.04) AND (l_discount <= 0.07) AND (l_quantity < 24))"
" Rows Removed by Filter: 2210062"
" Heap Blocks: exact=19826 lossy=340997"
" -> Bitmap Index Scan on l_shipdate_idx (cost=0.00..105684.18 rows=7889962 width=0) (actual time=243.006..243.006 rows=7959517 loops=1)"
" Index Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01'::date))"
"Planning Time: 2.023 ms"
"Execution Time: 34120.876 ms"
- Does this mean the Optimizer is not using the index on
l_shipdatebecausel_shipdate>='01/01/1994'and l_shipdate<'01/01/1995'is not that selective as a condition? I guess the Optimizer prefers using the sequential scan onlineitemtable, instead.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
