'What is the fastest way to add new column based on dataframe entries in specific columns

So I have this dataframe

#   Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6
# 1   A1     x <NA> <NA> <NA>  <NA>  <NA>
# 2   A2  <NA>    x <NA> <NA>  <NA>  <NA>
# 3   B1  <NA> <NA>    x <NA>  <NA>  <NA>
# 4   B2  <NA> <NA> <NA> <NA>     x  <NA>
# 5   B3  <NA> <NA> <NA>    x  <NA>  <NA>
# 6   D2  <NA> <NA> <NA> <NA>  <NA>     x
# 7   F6  <NA> <NA> <NA> <NA>     x  <NA>

I want to add a column to databackend that displays a value from 1 to 6 based on which column the "x" is on databackend. So the additional column would look like this

#   Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6 stage
# 1   A1     x <NA> <NA> <NA>  <NA>  <NA>     1
# 2   A2  <NA>    x <NA> <NA>  <NA>  <NA>     2
# 3   B1  <NA> <NA>    x <NA>  <NA>  <NA>     3
# 4   B2  <NA> <NA> <NA> <NA>     x  <NA>     5
# 5   B3  <NA> <NA> <NA>    x  <NA>  <NA>     4
# 6   D2  <NA> <NA> <NA> <NA>  <NA>     x     6
# 7   F6  <NA> <NA> <NA> <NA>     x  <NA>     5

since my dataframe is very large in the original script, I am looking for the fastest (automatic) way to do this. I´ve tried using the for loop but it takes too long.

data

databackend <- structure(list(Name = c("A1", "A2", "B1", "B2", "B3", "D2", "F6"
), Comp1 = c("x", NA, NA, NA, NA, NA, NA), Con2 = c(NA, "x", 
NA, NA, NA, NA, NA), Vis3 = c(NA, NA, "x", NA, NA, NA, NA), Tra4 = c(NA, 
NA, NA, NA, "x", NA, NA), Pred5 = c(NA, NA, NA, "x", NA, NA, 
"x"), Adap6 = c(NA, NA, NA, NA, NA, "x", NA), stage = c(1, 2, 
3, 5, 4, 6, 5)), row.names = c(NA, -7L), class = "data.frame")


Solution 1:[1]

Relatively simple

> tmp=which(databackend[,-1]=="x",arr.ind=T)
> tmp[order(tmp[,"row"]),"col"]
[1] 1 2 3 5 4 6 5

Solution 2:[2]

Using which and apply:

apply(databackend[-1], 1, \(x) which(x == "x"))
#[1] 1 2 3 5 4 6 5

A benchmark, max.col is the fastest:

microbenchmark::microbenchmark(
  apply = apply(databackend[-1], 1, \(x) which(x == "x")),
  which = {tmp=which(databackend[,-1]=="x",arr.ind=T)
  tmp[order(tmp[,"row"]),"col"]},
  max.col = max.col(!is.na(databackend[-1]))
)

Unit: microseconds
    expr   min     lq    mean median     uq    max neval
   apply 149.4 165.95 232.308 196.20 216.95 2882.4   100
   which 118.9 144.35 184.684 158.10 190.45  907.0   100
 max.col  51.5  73.00  88.302  79.45  94.40  326.1   100

Solution 3:[3]

We can try

> rowSums(col(databackend[-1])*(!is.na(databackend[-1])))
[1] 1 2 3 5 4 6 5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Maël
Solution 3 ThomasIsCoding