'Python regex to match integers but not floats

I need a Python regular expression to match integers but not floats from a string input.

The following regex uses a negative lookahead and a negative lookbehind to make sure that a number is neither preceded nor followed by a '.'.

(?<!\.)[0-9]+(?!\.)

It works only for single digit floats. e.g.

int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
str_int_list = int_regex.findall(text)

Correct when no more than 1 digit on each side of a float:
"1 + 2 + 3.0 + .4 + 5. + 66 + 777" --> ['1', '2', '66', '777']

Incorrectly matches the '1' of '12.3' and the '5' of '.45':
"12.3 + .45 + 678" --> ['1', '5', '678']

The problem appears to be that the [0-9]+ in the middle of the regex is not greedy enough.

I tried adding number matches to the lookahead and lookbehind but ran into the 'lookbehinds need to be a constant-length' in Python error.

Any suggestions as to how to match only whole integers and no floats at all would be really appreciated.



Solution 1:[1]

Simply add \d to the lookahead and lookbehind patterns:

import re

int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
re2 = re.compile("(?<![\.\d])[0-9]+(?![\.\d])")

text = "1 + 2 + 3.0 + .4 + 5. - .45 + 66 + 777 - 12.3"
print "int_regex:", int_regex.findall(text)
print "re2      :", re2.findall(text)

int_regex: ['1', '2', '5', '66', '777', '1']
re2      : ['1', '2', '66', '777']

The lookahead/behind patterns define a number boundary (much like \b defines a word boundary) and the only thing you are allowing in the number is digits.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ErikR