'Google colab: Read .xlsx file in from Github pandas
from Google Colab, I am trying to create a df from a xlsx file I have on a Github repo. As url I take the permalink from Github, the repo is public and account in connected to Colab
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!'
Thank you in advance for your help!
Solution 1:[1]
Maybe the problem is due to the URL that you are using.
You should try to do this to see what is returned by request.get
.
url = "https://github.com/your-user-name/your-repo-name/blob/main/data/raw/your-file-name.xlsx"
import requests
from pprint import pprint
response = requests.get(url)
pprint(response.content)
It is an HTML page. This is not what you want.
There are a couple of things you can do to solve this. This medium post here might be useful.
However, one simple thing is to use an URL like the example below:
https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx
I've already tried this and it works.
import requests
import pandas as pd
url = "https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx"
response = requests.get(url)
dest = 'local-file.xlsx'
with open(dest, 'wb') as file:
file.write(response.content)
frame = pd.read_excel(dest)
frame.head()
Conclusion: change your URL.
Solution 2:[2]
Please use link from "view raw"
. for my file I use below url
url = 'https://github.com/mehadisaki/Sales-Forecasting-model-development-/blob/main/TV%20Delivery_2016-2022.xlsx?raw=true'
db=pd.read_excel(url)
Solution 3:[3]
With Google Colab one thing you could do is use the wget
command, like this.
!wget "https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Giliard Godoi |
Solution 2 | Freddy Mcloughlan |
Solution 3 | Giliard Godoi |