'Multiprocessing API call using columns of Pandas DataFrame

I've just started working with the multiprocessing Python library. I would like to make many API calls (get) using requests. I have a Pandas dataframe in which each row has the arguments I will be using to process the requests.get.

Here is an example of the dataframe I want to starmap to.

import pandas as pd
d = {
    "companyId": ['1000','1005'],
    "headers": [{'Authorization': 'Bearer token1'},{'Authorization': 'Bearer token1'}],
    "employeeId": ['1500','1500'],
    "date": ['2022-01-01','2022-01-02']
}
df = pd.DataFrame(d)
df.head()

Code to make request:

import multiprocessing as mp

def get_data(df: pd.DataFrame):
    query: dict = {
        'companyId': df['companyId'].astype(str),
        'driverId': df['employeeId'].astype(str),
        'day': df['date'].astype(str)
    }
    resp = requests.get(url=df['url'], headers=df['headers'], params=query)
    return resp

if __name__ == "__main__":
     with mp.Pool(mp.cpu_count()) as p:
          res = list(p.starmap(get_data, zip(df.itertuples())))
          print(res)
          p.close()
          p.join()

However, I receive some errors I am trying to understand. Ultimately, I want to map the api function to each row of my pandas dataframe in a parallel fashion. I would prefer to just use the multiprocessing library but do not necessarily need to use Pandas here if there is a simpler and more native solution.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source