'How to build a paneldata model in R?
We would really appreciate some help building regression model(s) using a paneldata set. The dataset consists of data from retailstores over two years. Our research question are as follows: How does different payment methods affect retailstores unregistered shrinkage?
As a result we want to see which payment method that gives the highest increase in shrinkage, and compare the payment methods.
We have tried to use these following models, but the results does not make sense. y = x1 + x2 + x3 + u
- Shrinkage = sale self-service checkout + revenue + region + u
- Shrinkage = sale ShopExpress + revenue + region + u
- Shrinkage = sale served checkout + revenue + region + u
We want to control for the stores size and region, which is why these variables are included. Also all stores in the dataset have served checkouts, and some have self-service and ShopExpress (Scan and go). We are therefore not able to use dummys for the payment methods?
- Is there a better way to create the regression models? Is it possible to gather them into one model?
- Do you have suggestions for other control variables that we should include?
- To be able to run the random effect model, we had to transform the variables into natural logarithm form. Does it make sense to use ln?
Solution 1:[1]
The first thing you want to think about is whether you are interested in within-store variation over time versus between-store variation. If you are currently running a random-effects model, your coefficients will be a mix of between- and within-store variation. So if Store A has self-service checkout only for the first half of the panel and Store B has self-service checkout only for the second half of the panel, the coefficient for self-service checkout is the change in shrinkage within and between those four halves of the panel for those cases. So if stores that have more self-service tend to have more shrinkage, generally, the coefficient could be positive, even if adopting self-service causes a decrease in shrinkage within a store. The random-effect is simply adjusting the standard errors to account for repeat observations.
Alternatively, you could use a fixed-effects panel model. I recommend this if you are interested in seeing how adopting each payment method causes a change within a store. This model effectively includes a dummy variable for each store, which absorbs all between-store variation and leaves only within-store variation. The coefficient for self-service in the above scenario now shows the change between Store A's first half and second half of the panel, as well as the change between Store B's first half and second half of the panel. It includes no information about whether stores that tend to have self-checkout tend to have more shrinkage. If you want to look at within- and between-variation together in one model, you might consider the hybrid fixed-effects model. If you choose the fixed-effects model, you no longer need to include (you cannot, in fact) any time-invariant controls, because all time-invariant aspects are already controlled for. This model only works if stores adopt (or discontinue) the payment methods during the panel.
Taking the natural log is not a bad idea, necessarily. As long as you do not have large percent changes in shrinkage, you can interpret your coefficients as percent changes in shrinkage. You should check and see if this transformation results in appropriate residuals.
If your values for shrinkage are whole numbers (either items or whole dollar amounts), you might consider using a Poisson fixed effects model or negative binomial fixed effects model, which automatically transform count data to a more appropriate distribution. The coefficients can be exponentiated to be interpreted as percent changes.
I think an important control for your model is time. If all stores tend to have more shrinkage at the start of the year (or any particular time) and stores also tend to make changes to payment methods at that start of the year, your model will mistakenly tell you that the new payment method causes more shrinkage. A simple approach is to add date dummies to the model, which results in the two-way fixed effects model. This will create approximately 700 dummy variables (if you have two years of data) in your output to account for daily differences from the overall expected value of shrinkage that are experienced by all stores. You could also try something more complex like dummies for days of the week and months in the sample. You can exclude these dummies from your table and note the time dummies you chose.
I don't see any reason why you can't include all three payment methods in the same model. You might do a table that includes four models. Three with the respective payment types and a fourth with all three.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | dcoy |
