'Multiprocessing API call using columns of Pandas DataFrame
I've just started working with the multiprocessing Python library. I would like to make many API calls (get) using requests. I have a Pandas dataframe in which each row has the arguments I will be using to process the requests.get.
Here is an example of the dataframe I want to starmap to.
import pandas as pd
d = {
"companyId": ['1000','1005'],
"headers": [{'Authorization': 'Bearer token1'},{'Authorization': 'Bearer token1'}],
"employeeId": ['1500','1500'],
"date": ['2022-01-01','2022-01-02']
}
df = pd.DataFrame(d)
df.head()
Code to make request:
import multiprocessing as mp
def get_data(df: pd.DataFrame):
query: dict = {
'companyId': df['companyId'].astype(str),
'driverId': df['employeeId'].astype(str),
'day': df['date'].astype(str)
}
resp = requests.get(url=df['url'], headers=df['headers'], params=query)
return resp
if __name__ == "__main__":
with mp.Pool(mp.cpu_count()) as p:
res = list(p.starmap(get_data, zip(df.itertuples())))
print(res)
p.close()
p.join()
However, I receive some errors I am trying to understand. Ultimately, I want to map the api function to each row of my pandas dataframe in a parallel fashion. I would prefer to just use the multiprocessing library but do not necessarily need to use Pandas here if there is a simpler and more native solution.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
