'Python Multiprocessing for Docker Requests to Get Distance

I am using OSRM backend with Docker to get distance between two places. It works fine, I don't have any problem here. However, it takes too long as my table has many rows(approx. 200k rows,30 minutes) and my row count will get bigger. I tried a few things to speed it up.

I changed mld algorithm to ch algorithm like here:

docker run --name tr_osrm -t -i -p 5000:5000 -v c:/osrm_tr:/data osrm/osrm-backend osrm-routed --algorithm ch /data/turkey-latest.osrm 

It speed up a little but not enough.

I gave threads parameter like here:

docker run --name tr_osrm -t -i -p 5000:5000 -v c:/osrm_tr:/data osrm/osrm-backend osrm-routed --algorithm ch --threads=8 /data/turkey-latest.osrm 

The running time didn't change.

I tried multiprocessing in Python to send request to Docker:

def osrm_distance(index):
    tmp_df = df_order.loc[index]
    from_lat, from_lon = tmp_df['customer_coords'].split(',')[0],tmp_df['customer_coords'].split(',')[1]
    to_lat, to_lon = tmp_df['store_coords'].split(',')[0],tmp_df['store_coords'].split(',')[1]
    from_ = {'lat':from_lat,'lon':from_lon}
    to_ = {'lat':to_lat,'lon':to_lon}
    
    url = f"""http://127.0.0.1:5002/route/v1/driving/{from_["lat"]},{from_["lon"]};{to_["lat"]},{to_["lon"]}?overview=false&alternatives=false"""
    try:
        r = requests.get(url)
        res = r.json()
        di = (res['routes'][0]['distance'])/1000
        dur = (res['routes'][0]['duration'])/60
    except:
        r.status_code =! 200
        di= -1    
    
    return pd.DataFrame({'index_no':[index],
                         'distance':[di]})

distances = []
if __name__=="__main__":
    available_cpu = os.cpu_count() - 1
    loop_lst = list(df_order.index)
    with Pool(available_cpu) as p:
        distances.append(p.map(osrm_distance, loop_lst))
        
distances = pd.concat(distances[0], ignore_index=True)

The osrm_distance function on the above is working well alone but when I use multiprocessing-pool I got this error and it doesn't send request to Docker:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.

Is there any way to get response from Docker faster for big data? Like reducing the time to one fourth or more :)

Thank you in advance.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source