'How to loop over a list of statistic type and apply them on an xarray.DataArray?
I need to compute a list statistics through time on a xarray.DataArray ans store them in a xarray.Dataset:
import xarray as xr
import numpy as np
import pandas as pd
np.random.seed(1234)
da = xr.DataArray(data=np.random.rand(4,5,10),
dims=["lon", "lat","time"],
coords={"lon": np.random.uniform(low=-90, high=90, size=4),
"lat": np.random.uniform(low=-90, high=90, size=5),
"time": pd.date_range(start="2021-01-01", freq="D", periods=10)})
da
first = True
if first:
first = False
ds = da.min(dim=['time']).to_dataset(name = "min")
else:
ds = ds.merge(da.min(dim=['time']).to_dataset(name = "min"))
ds = ds.merge(da.max(dim=['time']).to_dataset(name = "max"))
ds = ds.merge(da.median(dim=['time']).to_dataset(name = "median"))
ds = ds.merge(da.mean(dim=['time']).to_dataset(name = "mean"))
ds = ds.merge(da.std(dim=['time']).to_dataset(name = "std"))
ds
As I need to frequently change the statistics to apply, I tried to use a list of statistics and loop through it:
stats = ('min', 'max', 'median', 'mean', 'std')
first = True
for stat in stats:
if first:
first = False
ds = da.vars()['stat'](dim=['time']).to_dataset(name = vars()['stat'])
else:
ds = ds.merge(da.vars()['stat'](dim=['time']).to_dataset(name = vars()['stat']))
But I get an Error AttributeError: 'DataArray' object has no attribute 'vars'when trying to retrieve and apply the statistic type.
Thanks for any hint you could provide.
Solution 1:[1]
Thanks to @michael-delgado for informing me functions can as well be listed. Here is the answer I came through:
stats = [np.nanmin, np.nanmax, np.nanmedian, np.nanmean, np.nanstd]
first = True
for i in range(0, len(stats)):
da_stat = xr.DataArray(stats[i](a = da, axis = 2), dims = ['lon', 'lat'])
ds_stat = da_stat.assign_coords(lon = da.lon.values,
lat = da.lat.values).to_dataset(name = stats[i].__name__)
if first:
first = False
ds = ds_stat
else:
ds = ds.merge(ds_stat)
ds
The only downside with this solution, is I was not able to use xarray.DataArray statistic function, and had to replace them with numpy statistic functions, which double processing time.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | bchate |
