'Read ZipFile from URL into StringIO and parse with panda.read_csv
I'm trying to read ZipFile data from a URL and via StringIO parse the data inside the ZipFile as csv using pandas.read_csv
r = req.get("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").content
file = ZipFile(StringIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)
The last line gave me an error:
CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
However if i try using
salaries = pd.read_csv(file.open("Salaries.csv"))
it works.
So I was wondering what am I missing out here.
file.open should return a ZipExtFile object and since read_csv takes only string or file handle / StringIO input, why is the last line working then?
Solution 1:[1]
Few changes for Python 3.5 to @firelynx's answer
from zipfile import ZipFile
from io import BytesIO
import urllib.request as urllib2
r = urllib2.urlopen("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").read()
file = ZipFile(BytesIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)
print (salaries)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BBSysDyn |
