'Regression over time specific year as weight

I am doing a regression with panel data of EU countries over time with observations from 2007-16. I want to use the observation for 2007 for each specific country as the weight. Is there a simple way to do this?

The is essentially the regression I run, but I don't think the weighting is working as I intend it to.

lm(log(POP25) ~ log(EMPLOY25), weights = POP25, data = data)



structure(list(...1 = 1:6, TIME = 2007:2012, NUTS_ID = c("AT", 
"AT", "AT", "AT", "AT", "AT"), NUMBER = c(1L, 1L, 1L, 1L, 1L, 
1L), POP15 = c(5529.1, 5549.3, 5558.5, 5572.1, 5601.1, 5620.8
), POP20 = c(5047.1, 5063.2, 5072.6, 5090, 5127.1, 5151.9), POP25 = c(4544, 
4560.7, 4571.3, 4587.8, 4621.5, 4639), EMPLOY15 = c(3863.6, 3928.7, 
3909.3, 3943.9, 3982.3, 4013.4), EMPLOY20 = c(3676.2, 3737, 3723.8, 
3761.9, 3802.3, 3835), EMPLOY25 = c(3333.5, 3390.4, 3384.7, 3424.6, 
3454.4, 3486.4)), row.names = c(NA, 6L), class = "data.frame")


Solution 1:[1]

You are right - this is not doing what you expect it to. The reason is that you are supplying POP25 as the weight but you haven't yet made it explicit that you only want the POP25 value from 2007.

A weights vector needs to be the same length as the dependent and independent variables. The easiest way to do this is by creating a weights column in the table, where the value is the POP25 value for each NUTS_ID in the year 2007:

library(dplyr)

data  <- data  |>
    group_by(NUTS_ID)  |>
    mutate(weights = POP25[TIME==2007])

You can then supply this as the weights vector:

lm(log(POP25) ~ log(EMPLOY25), weights = weights, data = data)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 SamR