'Skipping unspecified blank rows when importing xls in python
I have xls files which I cannot control the format of. They have data of varying number of rows, then several blank rows at the bottom. I can get them to import fine using pd.read_excel() if I manually delete the blank rows first. However I want to run this without having to open the files outside of python first. Anyone got any ideas?
To reiterate - don't want to have to manipulate the files non programmatically and the location of the start of the blank rows varies from file to file and from day to day.
Edit for additional info xlrd.open_workbook() or pd.read_excel() both give the error message: "CompDocError: Workbook: size exceeds expected 414720 bytes; corrupt?" if I try without deleting the blank rows first. My ultimate aim will be to automate the load of these files from an inbox straight into a data etl pipeline. But just now I would settle for just being able to import without having to manipulate first! It's a daily process (or will be when I get it working)
The data in the Excel report is populated to row 1200 (for example, this changes so can't be hardcoded anywhere) but the spreadsheet has 65536 rows. As far as I can see there is nothing in rows 1201-65536 but the import seems to disagree.
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|