'read docx file error [closed]

import docx2txt

my_text=docx2txt.process("file1.docx")
print(my_text)

when I want to read the docx file from this code it shows the following error:

  File "/usr/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file


Solution 1:[1]

As @cowbert mentioned in the comment section, your file likely has been corrupted or it's in a zip format. Your provided code is correct. You can also use textract which supports .docx files:

import textract
text = textract.process("path/to/file.extension")

This package is built on top of several python packages and other source libraries. Once you install it, several packages (including docx2txt) are all installed by default with this package.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 micstr