'Regular Expression in python - for loop/specific output

Is there a way in which the function below can:

  1. look through multiple files
  2. print the actual email ([email protected]) in EACH file

The for loop seems to be limited based on the number of pages in ONE file. How can it consider all 15 files, and print out the email?

Without the need for a list to match with (below):

emails = ["[email protected]", "[email protected]", "ug{}[email protected]"]

I am using the below function to find emails throughout multiple files, but I am only getting results from one file.

for k in range(1,15):
    # open the pdf file
    object = PyPDF2.PdfFileReader("C:/my_path/file%s.pdf"%(k))

pattern = r"\"?([-a-zA-Z0-9.`?{}]+@\w+\.\w+)\"?" 
        NumPages = object.getNumPages()

        
        for i in range(0, NumPages):
            PageObj = object.getPage(i)
            print("this is page " + str(i)) 
            Text = PageObj.extractText() 
            
        for subText in pattern.findall(Text):
            print(subText)

The output I am looking for:

file1: [email protected]
file2: [email protected]
.
.
.
etc


Solution 1:[1]

Because you get out of loop at declaring a pattern variable.

import re

pattern = re.complile(r"\"?([-a-zA-Z0-9.`?{}]+@\w+\.\w+)\"?")
for k in range(1,15):
    # open the pdf file
    object = PyPDF2.PdfFileReader("C:/my_path/file%s.pdf"%(k))
 
    for i in range(object.getNumPages()):
        PageObj = object.getPage(i)
        print("this is page " + str(i)) 
        Text = PageObj.extractText() 

        for subText in re.findall(pattern, Text):
            print(subText)

By the way, I would change the pattern offhand to r""?([-a-zA-Z0-9.`?{}]+?@\w+?.\w+?)"?"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vovin