'Writing a script to find certain lines/string in multiple documents
I have a folder with multiple files (.doc and .docx). For the sake of this question I want to primarily deal with the .doc files unless for of these file types and be accounted for in the code.
I'm writing a code to read the folder and identify the .doc files. The objective is to output the paragraph 3, 4, and 7. I'm not sure why but python is reading each paragraph from a different spot in each file. I'm thinking maybe there are spacing/formatting inconsistencies that I wasn't aware of initially. To work around the formatting issue, I was thinking I could define the strings I want outputted. But I'm not sure how to do that. I tried to take add a string in the code but that didn't work.
How can I modify my code to be able to account for finding the strings that I want?
Original Code
doc = ''
for file in glob.glob(r'folderpathway*.docx'):
doc = docx.Document(file)
print (doc.paragraphs[3].text)
print (doc.paragraphs[4].text)
print (doc.paragraphs[7].text)
Code to account for the formatting issues
doc = ''
for file in glob.glob(r'folderpathway*.docx'):
doc = docx.Document(file)
print (doc.paragraphs["Substance Number"].text)
TypeError: list indices must be integers or slices, not str
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|