'How to glob two patterns with pathlib?

I want find two types of files with two different extensions: .jl and .jsonlines. I use

from pathlib import Path
p1 = Path("/path/to/dir").joinpath().glob("*.jl")
p2 = Path("/path/to/dir").joinpath().glob("*.jsonlines")

but I want p1 and p2 as one variable not two. Should I merge p1 and p2 in first place? Are there other ways to concatinate glob's patterns?



Solution 1:[1]

Try this:

from os.path import join
from glob import glob

files = []
for ext in ('*.jl', '*.jsonlines'):
   files.extend(glob(join("path/to/dir", ext)))

print(files)

Solution 2:[2]

from pathlib import Path

exts = [".jl", ".jsonlines"]
mainpath = "/path/to/dir"

# Same directory

files = [p for p in Path(mainpath).iterdir() if p.suffix in exts]

# Recursive

files = [p for p in Path(mainpath).rglob('*') if p.suffix in exts]

# 'files' will be a generator of Path objects, to unpack into strings:

list(files)

Solution 3:[3]

If you're ok with installing a package, check out wcmatch. It can patch the Python PathLib so that you can run multiple matches in one go:

from wcmatch.pathlib import Path
paths = Path('path/to/dir').glob(['*.jl', '*.jsonlines'])

Solution 4:[4]

Inspired by @aditi's answer, I came up with this:

from pathlib import Path
from itertools import chain

exts = ["*.jl", "*.jsonlines"]
mainpath = "/path/to/dir"

P = []
for i in exts:
    p = Path(mainpath).joinpath().glob(i)
    P = chain(P, p)
print(list(P))

Solution 5:[5]

Depending on your application the proposed solution can be inefficient as it has to loop over all files in the directory multiples times, (one for each extension/pattern).

In your example you are only matching the extension in one folder, a simple solution could be:

from pathlib import Path

folder = Path("/path/to/dir")
extensions = {".jl", ".jsonlines"}
files = [file for file in folder.iterdir() if file.suffix in extensions]

Which can be turned in a function if you use it a lot.

However, if you want to be able to match glob patterns rather than extensions, you should use the match() method:

from pathlib import Path

folder = Path("/path/to/dir")
patterns = ("*.jl", "*.jsonlines")

files = [f for f in folder.iterdir() if any(f.match(p) for p in patterns)]

This last one is both convenient and efficient. You can improve efficiency by placing most common patterns at the beginning of the patterns list as any is a short-circuit operator.

Solution 6:[6]

keep = [".jl", ".jsonlines"]
files = [p for p in Path().rglob("*") if p.suffix in keep]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Aditi
Solution 2
Solution 3 Ciprian Tomoiag?
Solution 4 Gmosy Gnaq
Solution 5
Solution 6 0-_-0