'Turns thousands of dummy variables into multinomial variable
I have a dataframe of the following sort:
a<-c('q','w')
b<-c(T,T)
d<-c(F,F)
.e<-c(T,F)
.f<-c(F,F)
.g<-c(F,T)
h<-c(F,F)
i<-c(F,T)
j<-c(T,T)
df<-data.frame(a,b,d,.e,.f,.g,h,i,j)
a b d .e .f .g h i j
1 q TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
2 w TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
I want to turn all variables starting with periods at the start into a single multinomial variable called Index such that the second row would have a value 1 for the Index column, the third row would have a value 2, etc. :
df$Index<-c('e','g')
a b d .e .f .g h i j Index
1 q TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE e
2 w TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE g
Although many rows can have a T for any of period-initial variable, each row can be T for only ONE period-initial variable.
If it were just a few items id do an ifelse statement:
df$Index <- ifelse(df$_10000, '10000',...
But there are 12000 of these. The names for all dummy variables begin with underscores, so I feel like there must be a better way. In pseudocode I would say something like:
for every row:
for every column beginning with '_':
if value == T:
assign the name of the column without '_' to a Column 'Index'
Thanks in advance
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
