'Apply cut function to all the columns of a dataframe

I have a data frame that is composed of 10 continuous variables:

dat <- data.frame(replicate(10, sample(0:10,15,rep=TRUE)))

Let's say I want to bin one of the columns by width, so the lowest 1/3 of values would be low, the middle 1/3 of values would be medium, etc.

break_point <- sort(dat$X1)[round(1 * length(dat$X1)/3)]
break_point1 <- sort(dat$X1)[round(2 * length(dat$X1)/3)]

dat$X1 <- cut(dat$X1, breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high"))

How can I compute this bin for all the columns at the same time?

dat[1:length(dat)] <- lapply(dat[1:length(dat)], cut(breaks = c(-Inf, break_point, break_point1, Inf), labels = c("low", "medium", "high")))

This is what I have, but it's not working. As it says

Error in cut.default(breaks = c(-Inf, break_point, break_point1, Inf), : argument "x" is missing, with no default



Solution 1:[1]

We may need a lambda function

dat[] <- lapply(dat, function(x) cut(x, 
    breaks = c(-Inf, break_point, break_point1, Inf), 
     labels = c("low", "medium", "high")))

Or simply specify the parameters with its names, instead of cut(

dat[] <- lapply(dat, cut, 
      breaks = c(-Inf, break_point, break_point1, Inf),
      labels = c("low", "medium", "high"))

Solution 2:[2]

Try santoku::chop_equally():

library(santoku)
dat[] <-apply(dat, 2, santoku::chop_equally, groups = 3, 
        labels = c("low", "medium", "high"))

      X1       X2       X3       X4       ......
 [1,] "low"    "high"   "low"    "medium" ......
 [2,] "high"   "high"   "low"    "low"    ......
 [3,] "high"   "high"   "high"   "low"    ......
 ......

Note that this creates separate breakpoints for each column, based on the quantiles of the column. If you always want the same breakpoints, just do

breaks <- quantile(as.matrix(dat), 0:3/3)
dat[] <- apply(dat, 2, cut, breaks = breaks)

Also, you said you wanted to chop by width of intervals (equal width of each interval), but your example is chopping by quantiles (equal numbers of cells in each interval). If you want width of intervals, use santoku::chop_evenly().

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 akrun
Solution 2