'Smooth way to calculate index based on several variable comparisons in base R

Example data to copy

df <- data.frame(
  AA = c(100, 200, 300, 400), 
  X1 = c(2, 1, 3, 1),
  X2 = c(1, 3, 4, 1)
)

Based on the index of AA, and it's values, I would like to calculate the sum of indicators based on the condition df$AA[i] > df[df$X1[i], c('AA')] (here for X1) for every row on a fluctuating number of variables.

My probably naive approach is to use a for-loop, which works perfectly for a fixed number of variables (columns), in the given example X1, X2. My problem is that I do not know the number of variables beforehand. Theoretically, any number 1, 2, 3, ... is possibly.

for (i in 1:nrow(df)) {
  df$index[i] <- sum(df$AA[i] > df[df$X1[i], c('AA')],
                     df$AA[i] > df[df$X2[i], c('AA')])
}

Which gives the desired output for a fixed number of variables X1, X2:

df
#>    AA X1 X2 index
#> 1 100  2  1     0
#> 2 200  1  3     1
#> 3 300  3  4     0
#> 4 400  1  1     2

Is there a smooth base R approach which translates my approach to a flexible number of variables X1, ..., Xn?

Note, the reason why I am interested in a base R approach is my aim to extend an existing package, which is fully written in base R. So I would like to keep it like that. Loops or *apply-family approaches are both very welcome. I am aware of the fact that operations on dataframes are often considered to be slower. Since all variables AA, X1, ... are of the same length, a solution which does not rely on a dataframe structure would also be great!

^{Created on 2022-04-06 by the reprex package (v2.0.1)}

r dataframe

Solution 1:^[1]

You don't need to loop through rows. You can use Reduce.

Reduce(`+`, lapply(df[-1], function(x) df$AA > df$AA[x]))
#> [1] 0 1 0 2

Solution 2:^[2]

Does this correspond to what you're looking for ?

df$index <- apply(df, 1, function(x){sum(x[1] > df$AA[x[-1]])})

assuming that AA is the column 1 and all your Xi are all the other columns.

Solution 3:^[3]

The following one-liner will work especially because df is a data-frame:

df$index <- rowSums( # To sum over a non-specified number of columns
  mapply(
    df[,- which(names(df) == "AA")], # Everything except AA
    df[,"AA", drop = FALSE],         # Only AA, but in a data-frame
    FUN = function(index, aa) aa[index] < aa)) # Compare

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Allan Cameron
Solution 2	Valkyr
Solution 3

'Smooth way to calculate index based on several variable comparisons in base R

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]