'Unable to read_excel using pandas on CentOS Stream 9 VM: zipfile.BadZipFile: Bad magic number for file header
I've been running a script for several months now where I read and concat several excel exports using the following code:
files = os.listdir(os.path.abspath('exports/'))
for file in files:
if file.startswith('ap_statistics_') and file.endswith('.xlsx'):
excel_list.append(pd.read_excel('exports/' + file, sheet_name='Access Points'))
df = pd.concat(excel_list, axis=0, ignore_index=True)
This has worked just fine until this Saturday when I uploaded new exports to the CentOS Stream 9 VM where I have a cronjob running the script every hour.
Now I always get this error:
Traceback (most recent call last):
File "/root/projects/beacon_check_v8/main.py", line 310, in <module>
ap_check()
File "/root/projects/beacon_check_v8/main.py", line 260, in ap_check
siteaps_result = getaps()
File "/root/projects/beacon_check_v8/main.py", line 30, in getaps
excel_list.append(pd.read_excel('exports/' + file, sheet_name='Access Points'))
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
return load_workbook(
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 277, in read
self.read_strings()
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 143, in read_strings
with self.archive.open(strings_path,) as src:
File "/usr/lib64/python3.9/zipfile.py", line 1523, in open
raise BadZipFile("Bad magic number for file header")
zipfile.BadZipFile: Bad magic number for file header
I develop on my Windows 10 notebook using PyCharm with a Python 3.9 venv, same as on the VM, where the script continued to work just fine.
When researching online all I found was that sometimes .pyc files can cause issues so I created a completely new venv on the VM, installed all libraries (netmiko, pandas, openpyxl, etc.) and tried running the script again before and after deleting all .pyc files in the directory but no luck.
I have extracted the Excel file header using the following code:
with open('exports/' + file, 'rb') as myexcel:
print(myexcel.read(4))
Unfortunately it comes back as the same values on both my Windows venv as well as the CentOS venv: b'PK\x03\x04'
I don't know if this header value is correct or not but I can read the files on my Windows notebook just fine using pandas or excel.
Any help would be greatly appreciated.
Solution 1:[1]
The issue was actually the program I used to transfer the files between my notebook and the VM, WinSCP. I don't know why or how this caused the error but I was able to fix it by transferring directly over pscp.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | sate |
