'Importing Census Data from IPUMS -- adding weights
I'm trying to import census data from IPUMS into R but am not sure how to account for weights.
I extracted 41 variables spanning from 2000-2020. This dataset is called usa_00001.xml (data dictionary attached).
I took a look at the codebook for the imported data set to narrow down the list of variables for my analysis. Based on my review of the codebook, I decided to focus more on family structure, income, race/ethnicity, and education. Any variables that I determined would not prove useful were dropped from the new data set (data_clean1).
Variables: (1) year = year of census, stateicp, hhincome, nmothers, nfathers, nchild, hispan, race, educd, inctot, educd_mom, educd_pop, inctot_mom, and inctot_pop.
ddi <- read_ipums_ddi("usa_00001.xml")
data <- read_ipums_micro(ddi)
makeCodebook(data, replace=TRUE, output = "pdf")
data_clean1 <- data %>%
select(YEAR, STATEICP, HHINCOME, NMOTHERS, NFATHERS, NCHILD, HISPAN, RACE, EDUCD, INCTOT, EDUCD_MOM, EDUCD_POP, INCTOT_MOM, INCTOT_POP) %>%
rename(
'Year'='YEAR',
'State_ID' = 'STATEICP',
'Household_Income' = 'HHINCOME',
'NMothers' = 'NMOTHERS',
'NFathers' = 'NFATHERS',
'NChild' = 'NCHILD',
'Hispanic' = 'HISPAN',
'Race' = 'RACE',
'Education' = 'EDUCD',
'Income_Total' = 'INCTOT',
'Education_M' = 'EDUCD_MOM',
'Education_F' = 'EDUCD_POP',
'Income_Total_M' = 'INCTOT_MOM',
'Income_Total_F' = 'INCTOT_POP') %>%
filter(Race %in% c(1:2)) %>%
filter(Education %in% c(002, 062, 063, 064, 081, 101, 114, 116)) %>%
filter(Income_Total %in% c(1:1184000)) %>%
filter(Household_Income %in% c(1:2260000)) %>%
mutate(Hispanic = factor(Hispanic,
levels = c(0, 1, 2, 3, 4, 9),
labels = c("Not Hispanic", "Mexican", "Puerto Rican", "Cuban", "Other", "Not Reported")
)) %>%
mutate(Race = factor(Race,
levels = c(1, 2),
labels = c("White", "Black/African American")
)) %>%
mutate(Education = factor(Education,
levels = c(002, 062, 063, 064, 081, 101, 114, 116),
labels = c("No Schooling Completed", "High School Graduate or GED", "Regular High School Diploma", "GED or Alternative Credential", "Associate's Degree", "Bachelor's Degree", "Master's Degree", "Doctoral Degree")
))
How do I account for weights? Do I need to keep some of the variables I deleted out? Or should I use tidycensus instead?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
