'Python: how to compute the sum of a variabel in a netCDF file in a radius of n closest pixels?

I have a netCDF that looks like the following ad is available here:

import xarray as xr
ds=xr.open_dataset('merged_new.nc')
ds

enter image description here

The dataset has a spatial resolution of 300m. The variable Forest is a binary variable 0 or 1.

For each pixel (point) I would like to count the sum of Forest in a radius of 30km (i.e. ~300 closest pixels). I would like to do a sort of moving windows that allow me to assign to each pixel the sum of Forest in a radius of 30km.

Ideally I would like to add a variable to the dataset called ForestSum that is the sum of Forest in the surrounding.

Is there a way to do this?



Solution 1:[1]

Use BallTree from sklearn:

import xarray as xr
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree, DistanceMetric

ds = xr.open_dataset('merged_new.nc')

forest = ds['Forest'].values
latitude = ds['lat'].values
longitude = ds['lon'].values

df = pd.DataFrame(forest, latitude, longitude).rename_axis(index='lat', columns='lon') \
                                              .stack().rename('forest').reset_index()

# Prepare data
coords = np.radians(df[['lat', 'lon']])
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(coords, metric=dist)

Now you can query data. Suppose for the first point

# lat: -7.1125, lon: -73.9875
>>> df.iloc[tree.query_radius([coords.iloc[0]], r=0.003)[0]]['forest'].sum()
3080

In theory, you can do but the process is killed because there are too many records:

indices = tree.query_radius(coords, r=0.003)
df['sum'] = [df.iloc[i]['forest'].sum() for i in indices]

You can use a multiprocessing version and slice data.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Corralien