'Regex not operator

Is there an NOT operator in Regexes? Like in that string : "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)"

I want to delete all \([0-9a-zA-z _\.\-:]*\) but not the one where it is a year: (2001).

So what the regex should return must be: (2001) name.

NOTE: something like \((?![\d]){4}[0-9a-zA-z _\.\-:]*\) does not work for me (the (20019) somehow also matches...)



Solution 1:[1]

Not quite, although generally you can usually use some workaround on one of the forms

  • [^abc], which is character by character not a or b or c,
  • or negative lookahead: a(?!b), which is a not followed by b
  • or negative lookbehind: (?<!a)b, which is b not preceeded by a

Solution 2:[2]

You could capture the (2001) part and replace the rest with nothing.

public static string extractYearString(string input) {
    return input.replaceAll(".*\(([0-9]{4})\).*", "$1");
}

var subject = "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)";
var result = extractYearString(subject);
System.out.println(result); // <-- "2001"

.*\(([0-9]{4})\).* means

  • .* match anything
  • \( match a ( character
  • ( begin capture
  • [0-9]{4} any single digit four times
  • ) end capture
  • \) match a ) character
  • .* anything (rest of string)

Solution 3:[3]

Here is an alternative:

(\(\d{4}\))((?:\s*\([0-9a-zA-z _\.\-:]*\))*)([^()]*)(( ?\([0-9a-zA-z _\.\-:]*\))*)

Repetitive patterns are embedded in a single group with this construction, where the inner group is not a capturing one: ((:?pattern)*), which enable to have control on the group numbers of interrest.

Then you get what you want with: \1\3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 birgersp
Solution 3 lalebarde