'Python: For loop only iterates once - also using a with statement
I am trying to open a zip file and iterate through the PDFs in the zip file. I want to scrape a certain portion of the text in the pdf. I am using the following code:
def get_text(part):
#Create path
path = f'C:\\Users\\user\\Data\\Part_{part}.zip'
with zipfile.ZipFile(path) as data:
listdata = data.namelist()
onlypdfs = [k for k in listdata if '_2018' in k or '_2019' in k or '_2020' in k or '_2021' in k or '_2022' in k]
for file in onlypdfs:
with data.open(file, "r") as f:
#Get the pdf
pdffile = pdftotext.PDF(f)
text = ("\n\n".join(pdffile))
#Remove the newline characters
text = text.replace('\r\n', ' ')
text = text.replace('\r', ' ')
text = text.replace('\n', ' ')
text = text.replace('\x0c', ' ')
#Get the text that will talk about what I want
try:
text2 = re.findall(r'FEES (.+?) Types', text, re.IGNORECASE)[-1]
except:
text2 = 'PROBLEM'
#Return the file name and the text
return file, text2
Then in the next line I am running:
info = []
for i in range(1,2):
info.append(get_text(i))
info
My output is only the first file and text. I have 4 PDFs in the zip folder. Ideally, I want it to iterate through the 30+ zip files. But I am having trouble with just one. I've seen this question asked before, but the solutions didn't fit my problem. Is it something with the with statement?
Solution 1:[1]
You need to process all the files and store each of them as you iterate. An example of how you could do this is to store them in a list of tuples:
file_list = []
for file in onlypdfs:
...
file_list.append((file, text2)
return file_list
You could then use this like so:
info = []
for i in range(1,2):
list = get_text(i)
for file_text in list:
info.append(file_text)
print(info)
Solution 2:[2]
When you use the return statement on this line: return file, text2, you exit the for loop, skipping the other pdf's that you want to be reading.
The solution is to move the return statement outside of the for loop.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Luke |
