'Smooth way to calculate index based on several variable comparisons in base R
Example data to copy
df <- data.frame(
AA = c(100, 200, 300, 400),
X1 = c(2, 1, 3, 1),
X2 = c(1, 3, 4, 1)
)
Based on the index of AA, and it's values, I would like to calculate the sum of indicators based on the condition df$AA[i] > df[df$X1[i], c('AA')] (here for X1) for every row on a fluctuating number of variables.
My probably naive approach is to use a for-loop, which works perfectly for a fixed number of variables (columns), in the given example X1, X2. My problem is that I do not know the number of variables beforehand. Theoretically, any number 1, 2, 3, ... is possibly.
for (i in 1:nrow(df)) {
df$index[i] <- sum(df$AA[i] > df[df$X1[i], c('AA')],
df$AA[i] > df[df$X2[i], c('AA')])
}
Which gives the desired output for a fixed number of variables X1, X2:
df
#> AA X1 X2 index
#> 1 100 2 1 0
#> 2 200 1 3 1
#> 3 300 3 4 0
#> 4 400 1 1 2
Is there a smooth base R approach which translates my approach to a flexible number of variables X1, ..., Xn?
Note, the reason why I am interested in a base R approach is my aim to extend an existing package, which is fully written in base R. So I would like to keep it like that.
Loops or *apply-family approaches are both very welcome.
I am aware of the fact that operations on dataframes are often considered to be slower. Since all variables AA, X1, ... are of the same length, a solution which does not rely on a dataframe structure would also be great!
Created on 2022-04-06 by the reprex package (v2.0.1)
Solution 1:[1]
You don't need to loop through rows. You can use Reduce.
Reduce(`+`, lapply(df[-1], function(x) df$AA > df$AA[x]))
#> [1] 0 1 0 2
Solution 2:[2]
Does this correspond to what you're looking for ?
df$index <- apply(df, 1, function(x){sum(x[1] > df$AA[x[-1]])})
assuming that AA is the column 1 and all your Xi are all the other columns.
Solution 3:[3]
The following one-liner will work especially because df is a data-frame:
df$index <- rowSums( # To sum over a non-specified number of columns
mapply(
df[,- which(names(df) == "AA")], # Everything except AA
df[,"AA", drop = FALSE], # Only AA, but in a data-frame
FUN = function(index, aa) aa[index] < aa)) # Compare
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Allan Cameron |
| Solution 2 | Valkyr |
| Solution 3 |
