'Regular Expression in python - for loop/specific output
Is there a way in which the function below can:
- look through multiple files
- print the actual email ([email protected]) in EACH file
The for loop seems to be limited based on the number of pages in ONE file. How can it consider all 15 files, and print out the email?
Without the need for a list to match with (below):
emails = ["[email protected]", "[email protected]", "ug{}[email protected]"]
I am using the below function to find emails throughout multiple files, but I am only getting results from one file.
for k in range(1,15):
# open the pdf file
object = PyPDF2.PdfFileReader("C:/my_path/file%s.pdf"%(k))
pattern = r"\"?([-a-zA-Z0-9.`?{}]+@\w+\.\w+)\"?"
NumPages = object.getNumPages()
for i in range(0, NumPages):
PageObj = object.getPage(i)
print("this is page " + str(i))
Text = PageObj.extractText()
for subText in pattern.findall(Text):
print(subText)
The output I am looking for:
file1: [email protected]
file2: [email protected]
.
.
.
etc
Solution 1:[1]
Because you get out of loop at declaring a pattern variable.
import re
pattern = re.complile(r"\"?([-a-zA-Z0-9.`?{}]+@\w+\.\w+)\"?")
for k in range(1,15):
# open the pdf file
object = PyPDF2.PdfFileReader("C:/my_path/file%s.pdf"%(k))
for i in range(object.getNumPages()):
PageObj = object.getPage(i)
print("this is page " + str(i))
Text = PageObj.extractText()
for subText in re.findall(pattern, Text):
print(subText)
By the way, I would change the pattern offhand to r""?([-a-zA-Z0-9.`?{}]+?@\w+?.\w+?)"?"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vovin |
