'String search for range of ICD codes

I want to search in Stata for C00-D49 and flag them as Neoplasms.

I could do

gen neo =1 if strmatch(diagnosis, "C*")

But, unsure of how to make the string search limited only upto D49.

Also, I need to flag O00-O9A as Pregnancy.

I can do following as well:

gen neo =1 if strmatch(diagnosis, "D1*")

gen neo =1 if strmatch(diagnosis, "D2*")

gen neo =1 if strmatch(diagnosis, "D3*")

gen neo =1 if strmatch(diagnosis, "D4*")

But, is there a way to perform a string match for a given range?



Solution 1:[1]

The way I understand ICD codes to be organized, they are all in alphabetic order. So you do not need to search any strings, just compare them alphabetically like this:

* Example generated by -dataex-. For more info, type help dataex
clear
input str7 diagnosis
"ABB"
"A12"
"C34"
"D49.512"
"O02"
"Q34"
"C00.2"
end

gen neoplasm  = (diagnosis >= "C00" & diagnosis < "D50")
gen pregnancy = (diagnosis >= "O00" & diagnosis < "P")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TheIceBear