'xarray chunk dataset PerformanceWarning: Slicing with an out-of-order index is generating more chunks

I am trying to run a simple calculation based on two big gridded datasets in xarray (around 5 GB altogether, daily data from 1850-2100). I keep running out of memory when I try it this way

import xarray as xr

def readin(model):
    observed = xr.open_dataset(var_obs)
    model_sim = xr.open_dataset(var_sim)

    observed = observed.sel(time = slice('1989','2010'))
    model_hist = model_sim.sel(time = slice('1989','2010'))
    model_COR = model_sim

    return(observed, model_hist, model_COR)

def method(model):

    clim_obs = observed.groupby('time.day').mean(dim='time')
    clim_hist = model_hist.groupby('time.day').mean(dim='time')

    diff_scaling = clim_hist-clim_obs
    bc = model_COR.groupby('time.day') - diff_scaling

    bc[var]=bc[var].where(bc[var]>0,0)

    bc = bc.reset_coords('day',drop=True)

observed, model_hist, model_COR = readin('model')
method('model')

I tried to chunk the (full)

model_COR

to split up the memory

model_COR.chunk(chunks={'lat': 20, 'lon': 20})

or across the time dimension

model_COR.chunk(chunks={'time': 8030})

but no matter what I tried resulted in

PerformanceWarning: Slicing with an out-of-order index is generating xxx times more chunks

Which doesn't exactly sound like the outcome I want? Where am I going wrong here? Happy about any help!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'xarray chunk dataset PerformanceWarning: Slicing with an out-of-order index is generating more chunks

Sources

Related Questions