'The raster overlay function in R produces incorrect values for raster bricks

I'm trying to quantile match two RasterBricks. I am finding the percentile of value<=250 in old_data, which we'll call p. Then I'm finding the value at the p^th quantile of each cell in new_data. The overlay function seems to be giving me really weird results.

So first I find the percentiles of 250 in each cell of the RasterBrick old_data:

percentiles <- old_data %>% 
  stackApply(indices = 1, fun = function(x, ...) {sum(x<=250)/length(x)})
summary(percentiles)

>           index_1
> Min.    0.6702177
> 1st Qu. 0.7919252
> Median  0.9108949
> 3rd Qu. 0.9661281
> Max.    0.9950102
> NAs     0.0000000

plot(percentiles)

percentiles

This is all well and good, matches what I would expect. I am expecting the quantile-matched values in new_data to be around or below 250, and a few spot checks at random cells confirms that assumption:

x <- c(raster::extract(new_data, 1))
y <- raster::extract(percentiles, 1)
sort(x)[y*length(x)]
> [1] 245.0346

x <- c(raster::extract(new_data, 10))
y <- raster::extract(percentiles, 10)
sort(x)[y*length(x)]
> [1] 203.5514

x <- c(raster::extract(new_data, 50))
y <- raster::extract(percentiles, 50)
sort(x)[y*length(x)]
> [1] 224.4567

x <- c(raster::extract(new_data, 100))
y <- raster::extract(percentiles, 100)
sort(x)[y*length(x)]
> [1] 209.7806

But then when I use the overlay function to get quantiles across the whol extent of the raster, I'm getting numbers that are way wrong.

new_quantiles <- overlay(
  x = new_data, y = percentiles, 
  fun = function(x, y) {sort(x)[y*length(x)]})
summary(new_quantiles)

>            layer
> Min.    145.1617
> 1st Qu. 191.8718
> Median  281.7506
> 3rd Qu. 398.2013
> Max.    640.2073
> NAs       0.0000

plot(new_quantiles)

new quantiles

I don't understand. I should definitely not be getting values that high. I'm using the exact same formula, and the raster bricks are on identical grids so there's no issue with origin or resolution. What is going wrong?



Solution 1:[1]

Difficult to answer without some example data. For example (here using {terra}, the replacement of {raster}; but it would be mostly the same for {raster})

library(terra)
old <- rast(nl=10)
values(old) <- sample(500, size(old), replace=TRUE)
new <- rast(nl=5)
values(new) <- sample(500, size(new), replace=TRUE)

Your approach to compute the percentiles seems overly complex, as you can do:

p <- mean(old <= 250)
summary(p)

(It makes no sense to use stackApply (tapp in terra) with a single index for all layers; instead you can either use a direct arithmetic approach as above, or use calc (app in terra))

I do not understand what you are trying to achieve with overlay (what you mean with "get values everywhere"). Perhaps you can edit your question to make the code simpler and reproducible.

Without a reproducible example, all I can say is that the use of length in the function supplied to overlay is suspect. It may not be the length you expect.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1