'Rasterize polygons based on maximum overlap (using R packages terra or stars)

I have a question concerning rasterization of polygons by maximum overlap, i.e assign the value of the polygon that has the highst area overlap with the raster cell.

The real world exercise is to rasterize polygons of soil-IDs in R, in order to produce relatively low resolution maps of soil properties as model inputs.

The problem is that the rasterize() function of the terra package (and similar stars' st_rasterize()) assigns the cell value from the polygon that contains the cell midpoint. If a raster cell contains multiple polygons, I would rather like to select the value of the polygon (soil-ID), which has the highest aerea cover in a raster cell.

Here is a small self-contained example that visualizes my problem, using terra.

library(terra)

f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
r <- rast(v, ncols = 3, nrow = 3)
rcc <- vect(xyFromCell(r, cell = 1:ncell(r)))

x <- rasterize(v, r, field = "NAME_2")
plot(x)
lines(r, col = "light gray")
lines(v)
points(rcc)

reprex terra::rasterize

Mostly, the polygons that contain the cell center also seem to have the highest area share. However, in some cases (top row, 3rd cell), this is not the case. The problem appears to get worse the bigger the cells are compared with the polygons. I could therefore start with high resolution raster, and than resample to the desired (lower) resolution, using an aggregation function (e.g. the mode). But, maybe someone has a more efficient idea?

Thank you for your help!



Solution 1:[1]

You could do that like this:

#Example data
library(terra)    
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
r <- rast(v, ncols = 3, nrow = 3)

n <- 10
r <- disagg(r, n)
r <- rasterize(v, r, "ID_2")
x <- aggregate(r, n, "modal")

plot(x)
lines(x)
lines(v, lwd=2)
text(v, col="red", halo=T)
text(x, col="blue", halo=T)

enter image description here

Another way, probably less efficient (especially if you have many IDs):

z <- lapply(1:nrow(v), \(i) rasterize(v[i,], r, cover=TRUE))
z <- which.max(rast(z))

But you could replace rasterize with exactextractr::coverage_fraction if you want very high precision

Even less efficient, I suppose:

r <- rast(v, ncols = 3, nrow = 3)
values(r) <- 1:ncell(r)
# get weights
e <- extract(r, v, weights=TRUE)
e <- as.matrix(e)
head(e)
#    ID lyr.1 weight
#[1,]  1     1   0.38
#[2,]  1     2   0.49
#[3,]  2     2   0.06
#[4,]  2     4   0.05
#[5,]  2     5   0.52
#[6,]  2     6   0.06

# find cell with max weight (you can use dplyr or data.table intead) 
x <- sapply(unique(e[,2]), function(i) { 
    d <- e[e[,2] == i, ,drop=FALSE]
    d[which.max(d[,3]), 2:1]
})

# remove values 
r <- rast(r)
# assign ID to cells
r[x[1,]] <- x[2,]

You could achieve the same with using polygon intersection, but that does not scale well to large rasters

r <- rast(v, ncols = 3, nrow = 3)
values(r) <- 1:9
v$ID <- 1:nrow(v)
i <- intersect(v[,"ID"], as.polygons(r))
i$area <- expanse(i)
i <- data.frame(i) 
x <- sapply(split(i, i[,2]), 
    \(x) { x[which.max(x[,3]), 2:1] |> unlist()}
)
r <- rast(r)
r[x[1,]] <- x[2,]

(perhaps not as elegant as st_join proposed by lovalery)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1