'How to extract the largest polygon in a raster?

I can read a rastre and exrec polygones likes: If I have this code to read a raster and shapefile:

  library(raster)
  library(geojsonsf)
  library(sf)
  library(exactextractr)
  r <- raster(matrix(rnorm(10*12), nrow=10), xmn = -180, xmx= 180, ymn = -90, ymx= 90)
  myurl <- "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json"
 geo <- readLines(myurl)
 geo <- paste0(geo, collapse = "")
 system.time({ sf <- geojson_sf(geo)})
#add crs information for the raster 'r'
 crs(r) <- 4326
 # extract the 'r' raster value for each polygon 'NAME' in 'sf'
 res <- do.call(rbind, exactextractr::exact_extract(r, sf, include_cols = 'NAME'))[-3]

It can be that several polygons are within one pixel and I need to extract not all polygons but only the largest polygon in a pixel.



Solution 1:[1]

Here's some code that does what I think you want which is to find the polygon that is largest for each of the raster cells. I have modified the code so rasters can be uniquely identified and then I use GEO_ID because NAME is not unique in the data (there are 31 Washingtons for example). I use dplyr to find the maximum coverage for each raster and mapview to view the results and convince myself that the code is working.

library(raster)
library(geojsonsf)
library(sf)
library(exactextractr)
library(mapview)
library(dplyr)
# Give the raster cells a unique identifier so we can use this to find which 
#  polygon is the maximum in a given raster
r <- raster(matrix(1:120, nrow=10), xmn = -180, xmx= 180, ymn = -90, ymx= 90)
crs(r) <- 4326
myurl <- "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json"
geo <- readLines(myurl)
geo <- paste0(geo, collapse = "")
sf <- geojson_sf(geo)

# Find the polygons that overlap with the raster cells
#  the coverage fraction says how much overlap so 
#  simply find the largest whilst grouping ny the id
#  of the raster
# Use GEO_ID because NAME has many duplicates
overlay <- do.call(rbind, exactextractr::exact_extract(r, sf, include_cols = c('GEO_ID', 'NAME')))
# Find the maximum in each raster
maximum_in_raster <- overlay %>% group_by(value) %>% top_n(1, coverage_fraction)

# Create a subset of polygons corresponding to the maxima
# This is where GEO_ID is important
maximum_polygons <- sf[sf$GEO_ID %in% maximum_in_raster$GEO_ID, 'NAME']

# Make a grid to display 
sf_grid <- st_make_grid(r, n=c(12,10))

# Use mapview to have a look at the results to convince ourselves that it's working
mapview(maximum_polygons, alpha.regions=1) + mapview(sf_grid, col.regions='white') 

Here's an example showing the maxima in the south west of the US.

enter image description here

The 4 polygons shown are Lake, Nye, San Bernardino and San Luis Obispo. The horizontal and vertical lines show the boundaries of the raster cells.

And to make the point about duplicate names, here is the code to count how many names appear in each raster cell.

counts <- overlay %>% count(value, NAME, sort = T)
head(counts)
  value       NAME  n
1    33 Washington 13
2    23    Lincoln 12
3    33   Franklin 12
4    23 Washington 10
5    23    Douglas  9
6    23      Grant  9

Which shows 13 Washingtons in cell 33.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1