'Google colab: Read .xlsx file in from Github pandas

from Google Colab, I am trying to create a df from a xlsx file I have on a Github repo. As url I take the permalink from Github, the repo is public and account in connected to Colab

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!'

Thank you in advance for your help!



Solution 1:[1]

Maybe the problem is due to the URL that you are using.

You should try to do this to see what is returned by request.get.

url = "https://github.com/your-user-name/your-repo-name/blob/main/data/raw/your-file-name.xlsx"

import requests
from pprint import pprint

response = requests.get(url)
pprint(response.content)

It is an HTML page. This is not what you want.

There are a couple of things you can do to solve this. This medium post here might be useful.

However, one simple thing is to use an URL like the example below:

https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx

I've already tried this and it works.

import requests
import pandas as pd

url = "https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx"

response = requests.get(url)

dest = 'local-file.xlsx'

with open(dest, 'wb') as file:
    file.write(response.content)

frame = pd.read_excel(dest)

frame.head()

Conclusion: change your URL.

Solution 2:[2]

Please use link from "view raw". for my file I use below url

Pls use link from view raw

url = 'https://github.com/mehadisaki/Sales-Forecasting-model-development-/blob/main/TV%20Delivery_2016-2022.xlsx?raw=true'
db=pd.read_excel(url) 

Solution 3:[3]

With Google Colab one thing you could do is use the wget command, like this.

!wget "https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Giliard Godoi
Solution 2 Freddy Mcloughlan
Solution 3 Giliard Godoi