'Loop Through URLs to pull data From API (Python)
I have successfully created this code to pull in data from an api. I am pulling in games and some extra info.
It limits to 20 records when the script is ran.
How can I change this to loop through multiple URLs?
the only thing needs changed at the end of the url is the number.
import requests
import csv
url = "https://rawg-video-games-database.p.rapidapi.com/games?key=6ed342d0807f42f3ae9b2eafbd8410a9&page=1"
headers = {
"X-RapidAPI-Host": "rawg-video-games-database.p.rapidapi.com",
"X-RapidAPI-Key": "3aa825a480mshf29bc28a2e1bb23p13f777jsn9756efe984d7"
}
response = requests.request("GET", url, headers=headers, data={})
myjson = response.json()
ourdata = []
csvheader = ['ID', 'NAME', 'Rating', 'background_image']
for x in myjson['results']:
listing = [x['id'], x['name'], x['rating'], x['background_image']]
ourdata.append(listing)
with open('games.csv', 'w', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerow(csvheader)
writer.writerows(ourdata)
print('done')
Solution 1:[1]
the number at the end there is a "query parameter".
you can update the parameter in a number of ways. The closest to what you have would be something like this
pageNums = 10
url = "https://rawg-video-games-database.p.rapidapi.com/games?key=6ed342d0807f42f3ae9b2eafbd8410a9&page="
headers = {
"X-RapidAPI-Host": "rawg-video-games-database.p.rapidapi.com",
"X-RapidAPI-Key": "3aa825a480mshf29bc28a2e1bb23p13f777jsn9756efe984d7"
}
ourdata = []
for pageNum in range(pageNums):
response = requests.request("GET", url + str(pageNum), headers=headers, data={})
myjson = response.json()
for x in myjson['results']:
listing = [x['id'], x['name'], x['rating'], x['background_image']]
ourdata.append(listing)
with open('games.csv', 'w', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerow(csvheader)
writer.writerows(ourdata)
csvheader = ['ID', 'NAME', 'Rating', 'background_image']
print('done')
this does assume you know how many pages you expect to query. More likely, you don't, and that information is either in the response itself, or else it's not provided at all, and the only way to know is by checking the response.status. That might look a little more like
## ... set up vars as in the above example
## this replaces the outer `for` loop in the above example
done = False
pageNum = 1
while (not done):
response = requests.request("GET", url + str(pageNum), headers=headers, data={})
if response.status >= 400:
done = True
continue
myjson = response.json()
for x in myjson['results']:
listing = [x['id'], x['name'], x['rating'], x['background_image']]
ourdata.append(listing)
# ... write the csv as before
I haven't run this code, don't expect to be able to copy and paste. There may be complications with the response status -- I didn't check that this is exactly how it works, so consider this for educational purposes only.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ben |
