'Python: how to compute the sum of a variabel in a netCDF file in a radius of n closest pixels?
I have a netCDF that looks like the following ad is available here:
import xarray as xr
ds=xr.open_dataset('merged_new.nc')
ds
The dataset has a spatial resolution of 300m. The variable Forest is a binary variable 0 or 1.
For each pixel (point) I would like to count the sum of Forest in a radius of 30km (i.e. ~300 closest pixels). I would like to do a sort of moving windows that allow me to assign to each pixel the sum of Forest in a radius of 30km.
Ideally I would like to add a variable to the dataset called ForestSum that is the sum of Forest in the surrounding.
Is there a way to do this?
Solution 1:[1]
Use BallTree from sklearn:
import xarray as xr
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree, DistanceMetric
ds = xr.open_dataset('merged_new.nc')
forest = ds['Forest'].values
latitude = ds['lat'].values
longitude = ds['lon'].values
df = pd.DataFrame(forest, latitude, longitude).rename_axis(index='lat', columns='lon') \
.stack().rename('forest').reset_index()
# Prepare data
coords = np.radians(df[['lat', 'lon']])
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(coords, metric=dist)
Now you can query data. Suppose for the first point
# lat: -7.1125, lon: -73.9875
>>> df.iloc[tree.query_radius([coords.iloc[0]], r=0.003)[0]]['forest'].sum()
3080
In theory, you can do but the process is killed because there are too many records:
indices = tree.query_radius(coords, r=0.003)
df['sum'] = [df.iloc[i]['forest'].sum() for i in indices]
You can use a multiprocessing version and slice data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |

