'Python Multiprocessing for Docker Requests to Get Distance
I am using OSRM backend with Docker to get distance between two places. It works fine, I don't have any problem here. However, it takes too long as my table has many rows(approx. 200k rows,30 minutes) and my row count will get bigger. I tried a few things to speed it up.
I changed mld algorithm to ch algorithm like here:
docker run --name tr_osrm -t -i -p 5000:5000 -v c:/osrm_tr:/data osrm/osrm-backend osrm-routed --algorithm ch /data/turkey-latest.osrm
It speed up a little but not enough.
I gave threads parameter like here:
docker run --name tr_osrm -t -i -p 5000:5000 -v c:/osrm_tr:/data osrm/osrm-backend osrm-routed --algorithm ch --threads=8 /data/turkey-latest.osrm
The running time didn't change.
I tried multiprocessing in Python to send request to Docker:
def osrm_distance(index):
tmp_df = df_order.loc[index]
from_lat, from_lon = tmp_df['customer_coords'].split(',')[0],tmp_df['customer_coords'].split(',')[1]
to_lat, to_lon = tmp_df['store_coords'].split(',')[0],tmp_df['store_coords'].split(',')[1]
from_ = {'lat':from_lat,'lon':from_lon}
to_ = {'lat':to_lat,'lon':to_lon}
url = f"""http://127.0.0.1:5002/route/v1/driving/{from_["lat"]},{from_["lon"]};{to_["lat"]},{to_["lon"]}?overview=false&alternatives=false"""
try:
r = requests.get(url)
res = r.json()
di = (res['routes'][0]['distance'])/1000
dur = (res['routes'][0]['duration'])/60
except:
r.status_code =! 200
di= -1
return pd.DataFrame({'index_no':[index],
'distance':[di]})
distances = []
if __name__=="__main__":
available_cpu = os.cpu_count() - 1
loop_lst = list(df_order.index)
with Pool(available_cpu) as p:
distances.append(p.map(osrm_distance, loop_lst))
distances = pd.concat(distances[0], ignore_index=True)
The osrm_distance function on the above is working well alone but when I use multiprocessing-pool I got this error and it doesn't send request to Docker:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
Is there any way to get response from Docker faster for big data? Like reducing the time to one fourth or more :)
Thank you in advance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
