'Creating a matched sample using some variables
My dependent variable is an indicator variable (1 or 0). The issue is that my N is very large but the number of observations where my dependent variable =1 is very small (only 3%) so when I run regressions my coefficients are very tiny. So I want to create a matched sample across some dimensions. I know I should use psmatch2,
I tried
psmatch2 depvar v1 v2 v3, common
but I only have very small number of "treated" and a large number of "untreated" pretty much what I had in my original data. I want to keep the observations where y=1, and only want to create a sample that consists of similar obs across v1, v2, v3 and I want this group to have reasonably similar number of obs. Any idea?
Solution 1:[1]
Instead of creating a matched sample, an alternative approach would be to consider quasi-matching techniques, such as entropy balancing and coarsened exact matching.
To implement entropy balancing in Stata, you can try something like below:
ssc install ebalance
ebalance treat_var v1 v2 v3, tar(2)
The above commands install the ebalance package and assign weights to each observation such that the mean and the variance of variables v1, v2, v3 are roughly the same for treatment and control groups. Use help ebalance to find out more.
To implement coarsened exact matching in Stata, you can try something like the following:
ssc install cem
cem v1 v2 v3, tr(treat_var)
Both cem and ebalance generate a weight for each observation. The weight variables are stored in the dataset and named cem_weights and _webal respectively. To incorporate entropy balancing or coarsened exact matching into your regression analysis, simply estimate weighted regressions
regress y treat_var [aweight=_webal]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nick Cox |
