'Python: ICD-10 RegEx
Goal: create regex of ICD-10 codes.
Format
- Compulsory start:
Letter,Digit, (eitherLetterorDigit), - Optional end: has a
.then up to 4 Letters or Digits
I've most of the 1st half:
r'[A-Z][0-9][0-9]'
The second half I'm stuck on:
([a-z]|[0-9]){1,4}$
If there is something generated, it must have a dot .
Examples: .0 or .A9 or .A9A9 or .ZZZZ or .9999 etc.
Note: I know some ICD-10 codes don't surpass a certain number/ letter; but I am fine with this.
Solution 1:[1]
You can use
^[A-Z][0-9][A-Z0-9](?:\.[A-Z0-9]{1,4})?$
See the regex demo. Details:
^- start of string anchor[A-Z]- an uppercase ASCII letter[0-9]- an ASCII only digit[A-Z0-9]- an uppercase ASCII letter or an ASCII digit(?:\.[A-Z0-9]{1,4})?- an optional sequence of\.- a dot[A-Z0-9]{1,4}- one to four occurrences of an uppercase ASCII letter or an ASCII digit
$- end of string anchor (or\Zcan be used here, too).
In Python code, you can use the following to validate string input:
icd10_rx = re.compile(r'[A-Z][0-9][A-Z0-9](?:\.[A-Z0-9]{1,4})?')
if icd10_rx.fullmatch(text):
print(f'{text} is valid!')
Note the anchors are left out because Pattern.fullmatch (same as re.fullmatch) requires a full string match.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
