'Counting split rules in decision trees in R

I'm trying to count each unique split rule from a data frame of decision trees in R. For example, if I have a data frame containing 4 trees like the one shown below:

df <- data.frame(
  var = c('x10', NA, NA, 
          'x10', NA, 'x7', NA, NA,
          'x5', 'x2', NA, NA, 'x9', NA, NA,
          'x5', NA, NA),
  num = c(1,1,1,
          2,2,2,2,2,
          1,1,1,1,1,1,1,
          2,2,2),
  iter = c(rep(1, 8), rep(2, 10))
)

> df
    var num iter
1   x10   1    1
2  <NA>   1    1
3  <NA>   1    1
4   x10   2    1
5  <NA>   2    1
6    x7   2    1
7  <NA>   2    1
8  <NA>   2    1
9    x5   1    2
10   x2   1    2
11 <NA>   1    2
12 <NA>   1    2
13   x9   1    2
14 <NA>   1    2
15 <NA>   1    2
16   x5   2    2
17 <NA>   2    2
18 <NA>   2    2

The var column contains the variable name used in the splitting rule and is ordered by depth first. So, for example, the 4 trees created from that data would look like this:

decision trees

I'm trying to find a way to return the count of each pair of variables used in a split rule, but grouped by iter. For example, if we look at the 2nd tree (i.e.,num == 2, iter == 1) we can see that x7 splits on x10. so, the pair x10:x7 appears 1 time when iter == 1.

My desired output would look something like this:

 allSplits count iter
1    x10:x7     1    1
2     x5:x2     1    2
3     x5:x9     1    2

Any suggestions as to how I could do this?

r


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source