'What is the fastest way to add new column based on dataframe entries in specific columns
So I have this dataframe
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6
# 1 A1 x <NA> <NA> <NA> <NA> <NA>
# 2 A2 <NA> x <NA> <NA> <NA> <NA>
# 3 B1 <NA> <NA> x <NA> <NA> <NA>
# 4 B2 <NA> <NA> <NA> <NA> x <NA>
# 5 B3 <NA> <NA> <NA> x <NA> <NA>
# 6 D2 <NA> <NA> <NA> <NA> <NA> x
# 7 F6 <NA> <NA> <NA> <NA> x <NA>
I want to add a column to databackend that displays a value from 1 to 6 based on which column the "x" is on databackend. So the additional column would look like this
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6 stage
# 1 A1 x <NA> <NA> <NA> <NA> <NA> 1
# 2 A2 <NA> x <NA> <NA> <NA> <NA> 2
# 3 B1 <NA> <NA> x <NA> <NA> <NA> 3
# 4 B2 <NA> <NA> <NA> <NA> x <NA> 5
# 5 B3 <NA> <NA> <NA> x <NA> <NA> 4
# 6 D2 <NA> <NA> <NA> <NA> <NA> x 6
# 7 F6 <NA> <NA> <NA> <NA> x <NA> 5
since my dataframe is very large in the original script, I am looking for the fastest (automatic) way to do this. I´ve tried using the for loop but it takes too long.
data
databackend <- structure(list(Name = c("A1", "A2", "B1", "B2", "B3", "D2", "F6"
), Comp1 = c("x", NA, NA, NA, NA, NA, NA), Con2 = c(NA, "x",
NA, NA, NA, NA, NA), Vis3 = c(NA, NA, "x", NA, NA, NA, NA), Tra4 = c(NA,
NA, NA, NA, "x", NA, NA), Pred5 = c(NA, NA, NA, "x", NA, NA,
"x"), Adap6 = c(NA, NA, NA, NA, NA, "x", NA), stage = c(1, 2,
3, 5, 4, 6, 5)), row.names = c(NA, -7L), class = "data.frame")
Solution 1:[1]
Relatively simple
> tmp=which(databackend[,-1]=="x",arr.ind=T)
> tmp[order(tmp[,"row"]),"col"]
[1] 1 2 3 5 4 6 5
Solution 2:[2]
Using which and apply:
apply(databackend[-1], 1, \(x) which(x == "x"))
#[1] 1 2 3 5 4 6 5
A benchmark, max.col is the fastest:
microbenchmark::microbenchmark(
apply = apply(databackend[-1], 1, \(x) which(x == "x")),
which = {tmp=which(databackend[,-1]=="x",arr.ind=T)
tmp[order(tmp[,"row"]),"col"]},
max.col = max.col(!is.na(databackend[-1]))
)
Unit: microseconds
expr min lq mean median uq max neval
apply 149.4 165.95 232.308 196.20 216.95 2882.4 100
which 118.9 144.35 184.684 158.10 190.45 907.0 100
max.col 51.5 73.00 88.302 79.45 94.40 326.1 100
Solution 3:[3]
We can try
> rowSums(col(databackend[-1])*(!is.na(databackend[-1])))
[1] 1 2 3 5 4 6 5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Maël |
| Solution 3 | ThomasIsCoding |
