'Efficient way of calculating overlap area in PostGIS

I am working on calculating the overlap of two tables/layers in a PostGIS database. One set is a grid of hexagons for which I want to calculate the fraction of overlap with another set of polygons, for each of the hexagons. The multipolygon set also has a few polygons that overlap, so I need to dissolve/union those. Before I was doing this in FME, but it kept running out of memory for some of the larger polygons. That's why I want to do this in the database (and PostGIS should be very much capable of doing that).

Here is what I have so far, and it works, and memory is not running out now:

EXPLAIN ANALYZE
WITH rh_union AS (
    SELECT (ST_Dump(ST_Union(rh.geometry))).geom AS geometry
    FROM relevant_habitats rh 
    WHERE rh.assessment_area_id=1
) 
SELECT h.receptor_id, 
SUM(ROUND((ST_Area(ST_Intersection(rhu.geometry, h.geometry)) / ST_Area(h.geometry))::numeric,3)) AS frac_overlap
FROM rh_union rhu, hexagons h
WHERE h.zoom_level=1 AND ST_Intersects(rhu.geometry, h.geometry)
GROUP BY h.receptor_id

So I first break the multipolygon and union what I can. Then calculate the overlay of the hexagons with the polygons. Then calculate the (sum of all small pieces of) area.

Now, my question is:

  • is this an efficient way of doing this? Or would there be a better way?

(And a side question: is it correct to first ST_Union and then ST_Dump?)

--Update with EXPLAIN ANALYZE Output for a single area:

"QUERY PLAN"
"GroupAggregate  (cost=1996736.77..15410052.20 rows=390140 width=36) (actual time=571.303..1137.657 rows=685 loops=1)"
"  Group Key: h.receptor_id"
"  ->  Sort  (cost=1996736.77..1998063.55 rows=530712 width=188) (actual time=571.090..620.379 rows=806 loops=1)"
"        Sort Key: h.receptor_id"
"        Sort Method: external merge  Disk: 42152kB"
"        ->  Nested Loop  (cost=55.53..1848314.51 rows=530712 width=188) (actual time=382.769..424.643 rows=806 loops=1)"
"              ->  Result  (cost=55.25..1321.51 rows=1000 width=32) (actual time=382.550..383.696 rows=65 loops=1)"
"                    ->  ProjectSet  (cost=55.25..61.51 rows=1000 width=32) (actual time=382.544..383.652 rows=65 loops=1)"
"                          ->  Aggregate  (cost=55.25..55.26 rows=1 width=32) (actual time=381.323..381.325 rows=1 loops=1)"
"                                ->  Index Scan using relevant_habitats_pkey on relevant_habitats rh  (cost=0.28..28.75 rows=12 width=130244) (actual time=0.026..0.048 rows=12 loops=1)"
"                                      Index Cond: (assessment_area_id = 94)"
"              ->  Index Scan using idx_hexagons_geometry_gist on hexagons h  (cost=0.29..1846.45 rows=53 width=156) (actual time=0.315..0.624 rows=12 loops=65)"
"                    Index Cond: (geometry && (((st_dump((st_union(rh.geometry))))).geom))"
"                    Filter: (((zoom_level)::integer = 1) AND st_intersects((((st_dump((st_union(rh.geometry))))).geom), geometry))"
"                    Rows Removed by Filter: 19"
"Planning Time: 0.390 ms"
"Execution Time: 1372.443 ms"

Update 2: now the output of EXPLAIN ANALYZE on the second select (SELECT h.receptor_id~) and the CTE replaced by a (temp) table:

"QUERY PLAN"
"GroupAggregate  (cost=2691484.47..20931829.74 rows=390140 width=36) (actual time=29.455..927.945 rows=685 loops=1)"
"  Group Key: h.receptor_id"
"  ->  Sort  (cost=2691484.47..2693288.89 rows=721768 width=188) (actual time=28.382..31.514 rows=806 loops=1)"
"        Sort Key: h.receptor_id"
"        Sort Method: quicksort  Memory: 336kB"
"        ->  Nested Loop  (cost=0.29..2488035.20 rows=721768 width=188) (actual time=0.189..27.852 rows=806 loops=1)"
"              ->  Seq Scan on rh_union rhu  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.016..0.050 rows=65 loops=1)"
"              ->  Index Scan using idx_hexagons_geometry_gist on hexagons h  (cost=0.29..1828.89 rows=53 width=156) (actual time=0.258..0.398 rows=12 loops=65)"
"                    Index Cond: (geometry && rhu.geometry)"
"                    Filter: (((zoom_level)::integer = 1) AND st_intersects(rhu.geometry, geometry))"
"                    Rows Removed by Filter: 19"
"Planning Time: 0.481 ms"
"Execution Time: 928.583 ms"


Solution 1:[1]

You want a metric describing the extent of overlap of a table of polygons on other polygons on another table. This query returns the id of overlapping polygons within the same table and an indicator as a percentage.

SELECT
CONCAT(a.polyid,' ',b.polyid) AS intersecting_polys,
CONCAT(a.attribute_x,' ',b.attribute_x) AS attribute,
ROUND((100 * (ST_Area(ST_Intersection(a.geom, b.geom))) / ST_Area(a.geom))::NUMERIC,0) AS pc_overlap
FROM your_schema.your_table AS a
LEFT JOIN your_schema.your_table AS b
ON (a.geom && b.geom AND ST_Relate(a.geom, b.geom, '2********'))
WHERE a.polyid != b.polyid
;

Note the term ON (a.geom && b.geom AND ST_Relate(a.geom, b.geom, '2********')) is used instead of ST_Covers. You might want to experiment which is correct in your use case.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BJW