'Regex to match only the first occurrence of four numbers in a line

I would like to sort thousands of Bibliographic entries via RegEx. Every entry is build like this:

Lastname, Firstname. 1900. Title etc.

Now I need a RegEx to match 1900. This works:

[0-9]{4}

Unfortunately, some titles include more than one four digits group, for example:

Lastname, Firstname. 1900. Title: 1920-1930. etc.

But I want to match only four digit group (i.e. 1900 but not 1920 or 1930).

Any help would be appreciated!



Solution 1:[1]

use this :

(^|\.)\s*([0-9]{4})\s*(\.|$)

DEMO

Solution 2:[2]

Just simply use this:

\b\d{4}\b

It will match the first occurrence of 4 consecutive digits.

Solution 3:[3]

With this regex, you get only the first four numbers in the text regardless of parentheses.

^[^\d]*(\d{4})

Explanation:

The regex contains two parts. First part:

  • ^ match at the start of the string.
  • [^\d]* it will match all non-number characters.

These two in combination will match all non-number characters until reach a number.

In the second part of the regex, with

  • () creates a group, then
  • \d{4} it matches all digits with a length of four

The first and second part of the regex makes that, only the first four digits to be matched in a group. Sample here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 stanimirsp