'pattern to exclude specific files

I'm trying to create regex which will list all .jpg in some directory except few files(static, not a pattern).

So, I've wrote this:(Python)

"^(?!358097_sat!823133_sat!140860_sat).*jpg$"
"^(?!358097_sat|823133_sat|140860_sat).*jpg$"

I want to list all JPEG files except for:

  • 358097_sat
  • 823133_sat
  • 140860_sat

But it gives me an error saying that no file found matching this pattern.

Here is the complete string and error message:

No files matched pattern: ../input/dataset/valid/^(?!358097_sat!823133_sat!140860_sat).*jpg$

I'm actually passing this regex to a tf-function:

tf.data.Dataset.list_files(dataset_path + val_data + "^(?!358097_sat|823133_sat|140860_sat).*jpg$", seed=SEED)
# dataset_path = "../input/dataset/"
# val_data = "valid/"

Complete error:

*InvalidArgumentError: Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b'No files matched pattern: ../input/dataset/valid/^(?!358097_sat|823133_sat|140860_sat).jpg$'

Here is the function reference: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#list_files



Solution 1:[1]

The static method list_files expects a string or list of strings containing globs, not regular expressions. See also filename matching.

Filename matching using globs does not have a way to negate a match. So you will have to write a custom function to do that.

You could use e.g. glob.glob() to generate a list of JPEG files, and then filter out the ones that match your strings.

ignore = ("358097_sat", "823133_sat", "140860_sat")

files = [f for f in glob.glob("*.jpg") if not any(j in f for j in ignore)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1