'Set temporary directory: open_mfdataset xarray, dask(?), python
I am opening multiple files with xarray.open_mfdataset and storing them again as one dataset. As I am doing so, my temporary directory runs out of memory.
How do I change the path to the temporary directory?
My code looks something like:
import xarray as xr
with xr.open_mfdataset( my_list_of_filepaths ) as in_data:
out_data = some_data_manipulation( in_data )
out_data.to_netcdf( out_filepath )
I have tried:
import dask
import xarray as xr
with dask.config.set({'temporary_directory': 'path_to_temp_dir'}):
with xr.open_mfdataset( my_list_of_filepaths ) as in_data:
out_data = some_data_manipulation( in_data )
out_data.to_netcdf( out_filepath )
and setting the environment variables TMPDIR, TEMP and TMP to my desired temporary directory without success.
Thank you!
Solution 1:[1]
xr.open_mfdataset( my_list_of_filepaths )
I believe you must pass at least chunks={} to the above call in order for dask to be invoked. You can check whether an array in your returned dataset is wrapping a Dask array or a normal one by checking its data attribute:
ds = xr.open_...
ds.my_variable.data
will either show concrete values (arrays of numbers) or dask.array.Array(...).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mdurant |
