'Getting Google Spreadsheet CSV into A Pandas Dataframe
I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
The data ends up looking like: (1st row headers)
',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n
The native pandas code that brings in the disk resident file looks like:
df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])
A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.
Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:
https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id
see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...
Solution 1:[1]
You can use read_csv() on a StringIO object:
from io import BytesIO
import requests
import pandas as pd
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])
In [11]: df.head()
Out[11]:
City region Res_Comm \
0 Dothan South_Central-Montgomery-Auburn-Wiregrass-Dothan Residential
10 Foley South_Mobile-Baldwin Residential
12 Birmingham North_Central-Birmingham-Tuscaloosa-Anniston Commercial
38 Brent North_Central-Birmingham-Tuscaloosa-Anniston Residential
44 Athens North_Huntsville-Decatur-Florence Residential
mkt_type Quradate National_exp Alabama_exp Sales_exp \
0 Rural 2010-01-15 00:00:00 2 2 3
10 Suburban_Urban 2010-01-15 00:00:00 4 4 4
12 Suburban_Urban 2010-01-15 00:00:00 2 2 3
38 Rural 2010-01-15 00:00:00 3 3 3
44 Suburban_Urban 2010-01-15 00:00:00 4 5 4
Inventory_exp Price_exp Credit_exp
0 2 3 3
10 4 4 3
12 2 2 3
38 3 3 2
44 4 4 4
Solution 2:[2]
Seems to work for me without the StringIO:
test = pd.read_csv('https://docs.google.com/spreadsheets/d/' +
'0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
'/export?gid=0&format=csv',
# Set first column as rownames in data frame
index_col=0,
# Parse column values to datetime
parse_dates=['Quradate']
)
test.head(5) # Same result as @TomAugspurger
BTW, including the ?gid= enables importing different sheets, find the gid in the URL.
Solution 3:[3]
Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.
sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'
First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:
csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
Then we pass it to pd.read_csv, which can take a URL.
df = pd.read_csv(csv_export_url)
This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.
Solution 4:[4]
My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!
gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())
Solution 5:[5]
I have been using the following utils and it worked so far:
def load_from_gspreadsheet(sheet_name, key):
url = 'https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}&headers=1'.format(
key=key, sheet_name=sheet_name.replace(' ', '%20'))
log.info('Loading google spreadsheet from {}'.format(url))
df = pd.read_csv(url)
return df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)
You must specify the sheet_name and the key. The key is the string you get from the url in the following path: https://docs.google.com/spreadsheets/d/{key}/edit/.
You can change the value of headers if you have more than one row for the column names but I am not sure if it still work with multi-headers.
It may brake if Google will change their APIs.
Also please bear in mind that your spreadsheet must be public, everyone with the link can read it.
Solution 6:[6]
If the csv file was shared via drive and not via spreadsheet then the below change to the url would work
#Derive the id from the google drive shareable link.
#For the file at hand the link is as below
#<https://drive.google.com/open?id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69>
file_id='1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
link='https://drive.google.com/uc?export=download&id={FILE_ID}'
csv_url=link.format(FILE_ID=file_id)
#The final url would be as below:-
#csv_url='https://drive.google.com/uc?export=download&id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
df = pd.read_csv(csv_url)
And the dataframe would be (if you just ran the above code)
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
See working code here.
Solution 7:[7]
In Google Sheets file go to File > Publish to the web > Select .csv (see screenshot) > Copy link
Code
import pandas as pd
path = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSvmELTzIjfSmX8GuV3HE2qomN3uRyvPX8RDzpw77JH33DUbj1bjech7H6NYPArvpZFux0DdJ5L5TKy/pub?output=csv'
data = pd.read_csv(path)
print(data)
Solution 8:[8]
First
- Click on File
- Select Publish to the web tab
- Select which sheet you want as CSV(in case of multiple sheet) and also change the format from webpage to Comma-separated values
- Click on publish
- Copy the link eg: https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv
import pandas as pd
pd.read_csv("https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv")
Solution 9:[9]
This works for me.
import pandas as pd
#Create a public URL
#https://docs.google.com/spreadsheets/d/0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc/edit?usp=sharing
#get spreadsheets key from url
gsheetkey = "0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc"
#sheet name
sheet_name = 'Sheet 1'
url=f'https://docs.google.com/spreadsheet/ccc?key={gsheetkey}&output=xlsx'
df = pd.read_excel(url,sheet_name=sheet_name)
print(df)
Solution 10:[10]
Straight to the point:
- Get your google URL
https://docs.google.com/spreadsheets/d/ this is your sheet ID number/edit#gid=This will be your tab name, it will be a number. Each tab has its own
I like to make a function(not making here) so I separate my variables
- Sheet_id = "Place your sheet ID here"
- Sheet_name = "Place your sheet # here"
the next URL is the tricky party:
- url = f"https://docs.google.com/spreadsheets/d/ {Sheet_id}/export?gid= {Sheet_name}&format=csv"
Then just read it in
df = pd.csv(url)
That's it. If you need to select a different row as a header you can do this
df = pd.csv(url, header=1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
