'combining multiple items to create one dummy variable
I have 7 items/variables in Stata that address the same survey question. These 7 items are each different weight control behaviors (diet, exercise, pills, etc.). I am trying to combine these variables to create a single weight control behavior dummy variable that is coded as yes (did engage in weight control) and no (did not engage in weight control).
The response options for each variable look something like this for a given weight control behavior
dieted
11438 0 not marked
2771 1 marked
16 6 refused
6508 7 legitimate skip
13 8 don’t know
Here is my code. I re-coded 6,7,8 for all 7 vars as missing:
tab1 h1gh30a-h1gh30g,m`
foreach X of varlist h1gh30a-h1gh30g {
replace `X'=. if `X' > 1
}
egen wgt_control= rowmax(h1gh30a-h1gh30g)
ta wgt_control
gen wgt_control_new=wgt_control
replace wgt_control_new = 1 if wgt_control>0 & wgt_control!=.
replace wgt_control_new= 0 if wgt_control <1
ta wgt_control_new
I used rowmax() to combine all 7 items but my issue is that the response option 0 or No doesn't appear when I tabulate it. I only get those who responded yes=1.
Solution 1:[1]
Here is a suggestion with a reproducible example for what I think is the cleanest approach. I also included some unsolicited advice about survey data best practices
* Example generated by -dataex-. For more info, type help dataex
clear
input double(h1gh30a h1gh30b h1gh30c)
1 1 1
1 0 1
6 1 8
0 0 0
7 6 8
end
* Explicit coding is better, so if possible, which it is with 7 vars,
* create a local with the vars are explicitly listed
local wgt_controls h1gh30a h1gh30b h1gh30c
* Recode is a better command to use here. And do not destroy information,
* there is a survey data quality assurance difference between respondent
* refusing to answer, not knowing or question skipped. You can replace this
* survey codes with these extended missing values that behaves like missing values
* but retain the differences in the survey codes
recode `wgt_controls' (6=.a) (7=.b) (8=.c)
* While rowmax() could be used, I think it seems like anymatch() fits
* what you are trying to do better
egen wgt_control = anymatch(`wgt_controls'), values(1)
Solution 2:[2]
There is no minimal reproducible example here, so we can't reproduce the problem independently.
From your code, it seems that h1gh30a-h1gh30g are recoded so that all are 0, 1 or missing, so their maximum takes one of the same values.
gen wgt_control_new = wgt_control
replace wgt_control_new = 1 if wgt_control>0 & wgt_control!=.
replace wgt_control_new= 0 if wgt_control <1
seems to boil down to cloning the variable:
gen wgt_control_new = wgt_control
In short, I can't see a reason in your code why you should never see 0 as a possible result.
EDIT
A minimal check on whether there are zeros that aren't showing up as they should might be
egen max = rowmax(h1gh30a-h1gh30g)
list high30a-high30g if max == 0
```
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TheIceBear |
| Solution 2 |
