'Set temporary directory: open_mfdataset xarray, dask(?), python

I am opening multiple files with xarray.open_mfdataset and storing them again as one dataset. As I am doing so, my temporary directory runs out of memory.

How do I change the path to the temporary directory?

My code looks something like:

    import xarray as xr

    with xr.open_mfdataset( my_list_of_filepaths ) as in_data:

        out_data = some_data_manipulation( in_data )

        out_data.to_netcdf( out_filepath )

I have tried:

    import dask
    import xarray as xr

    with dask.config.set({'temporary_directory': 'path_to_temp_dir'}):

        with xr.open_mfdataset( my_list_of_filepaths ) as in_data:

            out_data = some_data_manipulation( in_data )

            out_data.to_netcdf( out_filepath )

and setting the environment variables TMPDIR, TEMP and TMP to my desired temporary directory without success.

Thank you!



Solution 1:[1]

xr.open_mfdataset( my_list_of_filepaths )

I believe you must pass at least chunks={} to the above call in order for dask to be invoked. You can check whether an array in your returned dataset is wrapping a Dask array or a normal one by checking its data attribute:

ds = xr.open_...
ds.my_variable.data

will either show concrete values (arrays of numbers) or dask.array.Array(...).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mdurant