'Reduce API call from gspread library
I'm currently trying to get data using gspread API from a drive folder containing about 50 excel files, each containing about 10 sheets (about 500 sheets in total).
I want to get 3 specific columns for all files in all sheets and append it into a dataframe.
I got this code working :
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("gspread/service_account.json",scope)
client = gs.authorize(creds)
file_list = client.list_spreadsheet_files()
file_list = list(map(itemgetter('name'),file_list))
for files in file_list:
file = client.open(files)
worksheet_list = file.worksheets()
print(files)
for sheet in worksheet_list[1:]:
print(sheet)
set_with_dataframe(sheet, df)
df = get_as_dataframe(sheet, parse_dates=True, usecols=[5, 7, 9], skiprows=1, header=None)
df.drop(df.index[10:100], axis=0, inplace=True)
print('Ajout de : ',df)
df_final = df_final.append(df)
df_final.fillna('', inplace=True)
df_final.reset_index(drop=True, inplace=True)
print(df_final)
The thing is I always get an error 429 (too much API calls) and I can't figured out how to reduced the number of calls. Even when using get_all_values() or get_all_records() functions it has to loops through all sheets.
Even when making a time.sleep(30) for each files.
If I'm not wrong I have about 22 calls for each files of 10 sheets (file = client.open(files) and worksheet_list = file.worksheets()) and 2 for each sheets ( set_with_dataframe() and get_as_dataframe()
I could make a time.sleep() for each worksheets instead but it would take very long (500+ sheets).
I could also change the pause time depending on the number of sheets (some files have more).
My questions are : 1) Is there function that could get all data from all sheets instead of having to loop through all sheets? 2) If not, is there a solution to reduce the calls without having to use pauses...?
If yes, it would be a solution and would reduce drastically the calls
Thanks in advance and nice evening, Alex
Solution 1:[1]
to answer your questions:
There will be soon in
gspreada method they returns all the data from all the sheets. See this issue hereAs of today the above solution is not available so in order to pull all data from a spreadsheet you can build your own request and use
gspreadto handle that request for you. Have a look at the library source code in theWorksheetclass in the method that return some values (likeget_rangeorgetorrange). You can easily build your request and request the full spreadsheet values.on the side, if you wish to reduce API calls you can do so by opening your spreadsheet files using their ID instead of listing all the files then looping over them. You will reduce the initial API call to the Drive API to list the files. The method
list_spreadsheet_filesis costly because the response is paginated, so it will make as many calls to get all the pages (i don't know the size of a page). Then the methodopenis costly too because it makes a new call tolist_spreadsheet_filesthen loops over the result to open the right file using it's ID. I would definitely start on that part of your code to reduce API calls.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dharman |
