'Recoding variable in another by digit?

Let's say I got this df

x<-as.numeric(1, 1, 2, 2, 2, 2, 3, 3, 3, 3)

y<-as.numeric(1110, 1399, 2334, 2566, 2777, 2888, 3000, 3213, 3566)


df<-data.frame(x, y)

I need to create a colum z that would recode y evaluating against the first, the first 2 or even the first 3 digits it starts with.

For example, let's say I want any y value starting with 1 to be also 1, that's easy, but when starting with 20xx to 23xx need to recode as 2, if 23xx to 29xxx recode as 3, and 30xx to 34xxx recode to 4, and 35XX to 39xx to be recoded as 5.

soy my result column z should be like this

>df$z
[1] 1 1 2 3 3 3 4 4 5

I'm not using case_when or ifelse because using "more than", "less than" or "equal to" restriction arguments wouldn't be efficient in this case since my real df is filled with 322,00 cases and 462 unique 4 digits codes in var y.

Or maybe I just don't know how to use those commands in this particular case.

Thanks for helping.



Solution 1:[1]

We could use findInterval

findInterval(df$y %/% 100, c(20,23, 29, 34),    
              left.open = TRUE) + 1

-output

[1] 1 1 2 3 3 3 4 4 5

data

df <- structure(list(x = c(1, 1, 2, 2, 2, 2, 3, 3, 3), y = c(1110, 
1399, 2334, 2566, 2777, 2888, 3000, 3213, 3566)),
 class = "data.frame", row.names = c(NA, 
-9L))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 akrun