'String search for range of ICD codes
I want to search in Stata for C00-D49 and flag them as Neoplasms.
I could do
gen neo =1 if strmatch(diagnosis, "C*")
But, unsure of how to make the string search limited only upto D49.
Also, I need to flag O00-O9A as Pregnancy.
I can do following as well:
gen neo =1 if strmatch(diagnosis, "D1*")
gen neo =1 if strmatch(diagnosis, "D2*")
gen neo =1 if strmatch(diagnosis, "D3*")
gen neo =1 if strmatch(diagnosis, "D4*")
But, is there a way to perform a string match for a given range?
Solution 1:[1]
The way I understand ICD codes to be organized, they are all in alphabetic order. So you do not need to search any strings, just compare them alphabetically like this:
* Example generated by -dataex-. For more info, type help dataex
clear
input str7 diagnosis
"ABB"
"A12"
"C34"
"D49.512"
"O02"
"Q34"
"C00.2"
end
gen neoplasm = (diagnosis >= "C00" & diagnosis < "D50")
gen pregnancy = (diagnosis >= "O00" & diagnosis < "P")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TheIceBear |
