'Issue with xarray.open_raterio() img file and multiprocessing Pool
I am trying to use mutliprocessing Pool.map() to speed up my code. In the function where I have computation occurring for each process I reference an xarray.DataArray that was opened using xarray.open_rasterio(). However, I receive errors similar to this:
rasterio.errors.RasterioIOError: Read or write failed. /net/home_stu/cfite/data/CDL/2019/2019_30m_cdls.img, band 1: IReadBlock failed at X offset 190, Y offset 115: Unable to open external data file: /net/home_stu/cfite/data/CDL/2019/
I assume this is some issue related to the same file being referenced simultaneously while another worker is opening it too? I use DataArray.sel() to select small portions of the raster grid that I work with since the entire .img file is way to big to load all at once. I have tried opening the .img file in the main code and then just referencing to it in my function, and I've tried opening/closing it in the function that is being passed to Pool.map() - and receive errors like this regardless. Is my file corrupted, or will I just not be able to work with this file using multiprocessing Pool? I am very new to working with multiprocessing, so any advice is appreciated. Here is an example of my code:
import pandas as pd
import xarray as xr
import numpy as np
from multiprocessing import Pool
def select_grid(x,y):
ds = xr.open_rasterio('myrasterfile.img') #opening large file with xarray
grid = ds.sel(x=slice(x,x+50), y=slice(y,y+50))
ds.close()
return grid
def myfunction(row):
x = row.x
y = row.y
mygrid = select_grid(x,y)
my_calculation = mygrid.sum() #example calculation, but really I am doing multiple calculations
my_calculation.to_csv('filename.csv')
with Pool(30) as p:
p.map(myfunction, list_of_df_rows)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
