'How to glob two patterns with pathlib?
I want find two types of files with two different extensions: .jl and .jsonlines. I use
from pathlib import Path
p1 = Path("/path/to/dir").joinpath().glob("*.jl")
p2 = Path("/path/to/dir").joinpath().glob("*.jsonlines")
but I want p1 and p2 as one variable not two. Should I merge p1 and p2 in first place? Are there other ways to concatinate glob's patterns?
Solution 1:[1]
Try this:
from os.path import join
from glob import glob
files = []
for ext in ('*.jl', '*.jsonlines'):
files.extend(glob(join("path/to/dir", ext)))
print(files)
Solution 2:[2]
from pathlib import Path
exts = [".jl", ".jsonlines"]
mainpath = "/path/to/dir"
# Same directory
files = [p for p in Path(mainpath).iterdir() if p.suffix in exts]
# Recursive
files = [p for p in Path(mainpath).rglob('*') if p.suffix in exts]
# 'files' will be a generator of Path objects, to unpack into strings:
list(files)
Solution 3:[3]
If you're ok with installing a package, check out wcmatch. It can patch the Python PathLib so that you can run multiple matches in one go:
from wcmatch.pathlib import Path
paths = Path('path/to/dir').glob(['*.jl', '*.jsonlines'])
Solution 4:[4]
Inspired by @aditi's answer, I came up with this:
from pathlib import Path
from itertools import chain
exts = ["*.jl", "*.jsonlines"]
mainpath = "/path/to/dir"
P = []
for i in exts:
p = Path(mainpath).joinpath().glob(i)
P = chain(P, p)
print(list(P))
Solution 5:[5]
Depending on your application the proposed solution can be inefficient as it has to loop over all files in the directory multiples times, (one for each extension/pattern).
In your example you are only matching the extension in one folder, a simple solution could be:
from pathlib import Path
folder = Path("/path/to/dir")
extensions = {".jl", ".jsonlines"}
files = [file for file in folder.iterdir() if file.suffix in extensions]
Which can be turned in a function if you use it a lot.
However, if you want to be able to match glob patterns rather than extensions, you should use the match() method:
from pathlib import Path
folder = Path("/path/to/dir")
patterns = ("*.jl", "*.jsonlines")
files = [f for f in folder.iterdir() if any(f.match(p) for p in patterns)]
This last one is both convenient and efficient. You can improve efficiency by placing most common patterns at the beginning of the patterns list as any is a short-circuit operator.
Solution 6:[6]
keep = [".jl", ".jsonlines"]
files = [p for p in Path().rglob("*") if p.suffix in keep]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Aditi |
| Solution 2 | |
| Solution 3 | Ciprian Tomoiag? |
| Solution 4 | Gmosy Gnaq |
| Solution 5 | |
| Solution 6 | 0-_-0 |
