'Counting split rules in decision trees in R
I'm trying to count each unique split rule from a data frame of decision trees in R. For example, if I have a data frame containing 4 trees like the one shown below:
df <- data.frame(
var = c('x10', NA, NA,
'x10', NA, 'x7', NA, NA,
'x5', 'x2', NA, NA, 'x9', NA, NA,
'x5', NA, NA),
num = c(1,1,1,
2,2,2,2,2,
1,1,1,1,1,1,1,
2,2,2),
iter = c(rep(1, 8), rep(2, 10))
)
> df
var num iter
1 x10 1 1
2 <NA> 1 1
3 <NA> 1 1
4 x10 2 1
5 <NA> 2 1
6 x7 2 1
7 <NA> 2 1
8 <NA> 2 1
9 x5 1 2
10 x2 1 2
11 <NA> 1 2
12 <NA> 1 2
13 x9 1 2
14 <NA> 1 2
15 <NA> 1 2
16 x5 2 2
17 <NA> 2 2
18 <NA> 2 2
The var column contains the variable name used in the splitting rule and is ordered by depth first. So, for example, the 4 trees created from that data would look like this:
I'm trying to find a way to return the count of each pair of variables used in a split rule, but grouped by iter. For example, if we look at the 2nd tree (i.e.,num == 2, iter == 1) we can see that x7 splits on x10. so, the pair x10:x7 appears 1 time when iter == 1.
My desired output would look something like this:
allSplits count iter
1 x10:x7 1 1
2 x5:x2 1 2
3 x5:x9 1 2
Any suggestions as to how I could do this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

