'How to specify dtype of 'S20' in parallel processing with xr.apply_ufunc() in python?

I opened a large dask xarray with the dimensions (time: 20, y: 50000, x: 100000). The variable 'var' I want to use contains uint8 values. For each timestep I want to convert the uint8 values to a letter via the dictionary d. For each location (y,x) I want to concatenate the 20 letters to a string of the format 'S20'.

I am using the xr.apply_ufunc() to run the code in parallel. Although I define the output_dtype to be 'S20' in the xr.apply_ufunc(), the output has a dtype of 'S1'. I have tested it with a smaller fraction of the xarray (size: 500, 1000) and noticed that if I load the variable before and don't specify the dtype in the xr.apply_ufunc() the output has the required dtype of 'S20'. My entire dataset is too large to load it to memory before.

My question is: How do I specify the output_dtype correctly without loading the xarray beforehand?

This is my code:

import xarray as xr
from dask.diagnostics import ProgressBar
from dask.distributed import Client
client = Client()

fp = 'myfile.nc'
ds = xr.open_dataset(fp, chunks={"y": 500, "x": 1000})
ds.close()
var = ds['var']

d = dict({
    0: 'A', 
    1: 'B', 
    2: 'C'
    })

def ttrans(tarray):
    for t in range(0,20):
        vt = d[tarray[t]] 
        if t == 0:
            temp = vt
        else:
            temp = temp + vt
    return temp

def pwrap(ds, dim=['time'], dask='parallelized'):
    with ProgressBar():
        res = xr.apply_ufunc(ttrans, 
                                    ds, 
                                    input_core_dims=[dim],
                                    vectorize=True, 
                                    dataset_fill_value='N', 
                                    dask=dask,  
                                    output_dtypes=['S20']
                                      ).compute()
    return res

result = pwrap(ds = var, dim = ['time'], dask='parallelized')


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source