'use xarray and numba packages to read data and calculate climatology

To speed up calculation of xarray packages, I tried to add numba guvectorize to functions, but there are several problems:

  1. If I write two functions: read_pr and day_clim, input of day_clim is no longer xarray since guvectorize is set to float64[:], float64[:]. Thus, groupby function does not work. I tried also xr.core.dataarray.DataArray[:], xr.core.dataarray.DataArray[:], but error pops NameError: name 'xr' is not defined.
  2. I would like to apply @guvectorize to read_pr, too. However, guvectorize needs type and shape declared at first, and the shape along each dimension should remain the same. For example,
    (m),(n),(n) -> (m,n)  # ok
    (n),() -> (m,n)  # error

Input in read_pr are string and float ( shape: () ), while the output is xarray ( type: <class 'xarray.core.dataarray.DataArray'>, shape: (l,m,n) )

Code:

from numba import float64, guvectorize
import numba
import numpy as np
import xarray as xr

path = '/data3/USERS/waynetsai/pyaos_wks_samples/data/'
fname = 'cmorph_sample.nc'

lats = -20
latn =  30
lon1 =  89
lon2 = 171
time1 = '2000-01-01'
time2 = '2020-12-31'


def read_pr(path, fname, time1, time2, lats, latn, lon1, lon2):
    with xr.open_dataset(path + fname) as pr_ds:
        pr = (pr_ds.sel(time=slice(time1,time2),
                               lat=slice(lats,latn),
                               lon=slice(lon1,lon2)).cmorph)
    return pr

pr = xr.apply_ufunc(read_pr, path, fname, time1, time2, lats, latn, lon1, lon2)

@guvectorize(
    "(float64[:], float64[:])",
    "(l,m,n) -> (l,m,n)"
)
def day_clim(pr):
    prGB = pr.groupby("time.day")
    prDayClim = prGB.mean("time")
    return prDayClim
prDayClim = xr.apply_ufunc(day_clim, pr)

All suggestions are welcome!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source