'parallelizing a Python function?
I have a function that submits a search job to a REST API, waits for the API to respond, then downloads 2 sets of JSON data, converts the both JSON's into Pandas dataframes, and returns both dataframes. below is a very simplified version of the function(minus error handling, logging, data scrubbing, etc...)
def getdata(searchstring, url, uname, passwd):
headers = {'content-type': 'application/json'}
json_data = CreateJSONPayload(searchstring)
rPOST = requests.post(url, auth=(uname, passwd), data=json_data, headers=headers)
statusURL = (str(json.loads(rPOST.text)[u'link'][u'href']))
Processing = True
while Processing == True:
rGET = requests.get(statusURL, auth=(uname, passwd))
if rGET.status_code== 200:
url1 = url + "/dataset1"
url2 = url + "/dataset2"
rGET1 = requests.get(url1, auth=(uname, passwd))
rGET2 = requests.get(url2, auth=(uname, passwd))
dfData1 = pd.read_json(rGET1.text)
dfData2 = pd.read_json(rGET2.text)
Processing = False
elif StatusCode == "Other return code handling":
print("handle errors") # Not relevant to question.
else:
sleep(15)
return dfData1, dfData2
The function itself works as expected. However the API being called can take anywhere from a couple of minutes to an hour to return the data depending on the parameters I pass to it and I need to submit multiple searches to it, so I'd rather not submit each search one after the other.
What's the best way to parallelize the calling of a function like this so that I can submit multiple requests to it at the same time, wait for all calls of the function have returned data, and finally continue on with data processing in the script?
I also need to be able to throttle the requests too, as the API rate limits me to no more than 15 concurrent connections at a time.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
