'Python regex to match integers but not floats
I need a Python regular expression to match integers but not floats from a string input.
The following regex uses a negative lookahead and a negative lookbehind to make sure that a number is neither preceded nor followed by a '.'.
(?<!\.)[0-9]+(?!\.)
It works only for single digit floats. e.g.
int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
str_int_list = int_regex.findall(text)
Correct when no more than 1 digit on each side of a float:
"1 + 2 + 3.0 + .4 + 5. + 66 + 777" --> ['1', '2', '66', '777']
Incorrectly matches the '1' of '12.3' and the '5' of '.45':
"12.3 + .45 + 678" --> ['1', '5', '678']
The problem appears to be that the [0-9]+ in the middle of the regex is not greedy enough.
I tried adding number matches to the lookahead and lookbehind but ran into the 'lookbehinds need to be a constant-length' in Python error.
Any suggestions as to how to match only whole integers and no floats at all would be really appreciated.
Solution 1:[1]
Simply add \d to the lookahead and lookbehind patterns:
import re
int_regex = re.compile("(?<!\.)[0-9]+(?!\.)")
re2 = re.compile("(?<![\.\d])[0-9]+(?![\.\d])")
text = "1 + 2 + 3.0 + .4 + 5. - .45 + 66 + 777 - 12.3"
print "int_regex:", int_regex.findall(text)
print "re2 :", re2.findall(text)
int_regex: ['1', '2', '5', '66', '777', '1']
re2 : ['1', '2', '66', '777']
The lookahead/behind patterns define a number boundary (much like \b defines a word boundary) and the only thing you are allowing in the number is digits.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ErikR |
