'how to add multiple columns from one data frame to another based on values in another column?
I have two data frames and I want to take the last four columns from df1 and add them to df2 based on the fips code in df2. The fips code is labeled x2 in df1.
x1 =c(8000:8003)
mi =c(10:13)
x2 =c(5000:5003)
x21_40 = c(0,0,1,0)
x41_60 = c(1,0,0,0)
x61_80 = c(0,1,0,0)
x81_100 = c(0,0,0,1)
df1 = data.frame(x1,mi,x2, x21_40,x41_60,x61_80, x81_100)
fips = c(5000:5003)
county_name=c("a","b","c","d")
df2 = data.frame(fips, county_name)
> df1
x1 mi x2 x21_40 x41_60 x61_80 x81_100
1 8000 10 5000 0 1 0 0
2 8001 11 5001 0 0 1 0
3 8002 12 5002 1 0 0 0
4 8003 13 5003 0 0 0 1
> df2
fips county_name
1 5000 a
2 5001 b
3 5002 c
4 5003 d
Desired
fips county_name x21_40 x41_60 x61_80 x81_100
1 5000 a 0 1 0 0
2 5001 b 0 0 1 0
3 5002 c 1 0 0 0
4 5003 d 0 0 0 1
I don't want to merge both data frames, I only want the last four columns, so I don't think it's as simple as using left_join(). I have used %in% before, but that was only for filtering values in one data frame, based on values in another column from a different data frame. This one has four columns, so I'm not sure what to do.
Solution 1:[1]
We can use left_join if we want to match the 'x2' from 'df1' and 'fips' from 'df2'
library(dplyr)
df2 <- left_join(df2, df1 %>%
select(x2:last_col()), by = c("fips" = "x2"))
-output
df2
fips county_name x21_40 x41_60 x61_80 x81_100
1 5000 a 0 1 0 0
2 5001 b 0 0 1 0
3 5002 c 1 0 0 0
4 5003 d 0 0 0 1
In case of duplicates in 'df1', get the max value for those columns grouped by 'fips/x2' and then do the join
df1 %>%
group_by(fips = x2) %>%
summarise(across(x21_40:x81_100, max, na.rm = TRUE),
.groups = "drop") %>%
left_join(df2, .)
Solution 2:[2]
It sounds like you want to augment df2 with information from df1. As long as the fips codes in df1 are unique ( df1$x2 ), this should work correctly.
Using data.table:
setDT(df2); setDT(df1)
xCols <- grep('_', names(df1), value = TRUE)
df2[df1, c(xCols):=mget(paste('i', xCols, sep='.')), on=.(fips=x2)]
df2
## fips county_name x21_40 x41_60 x61_80 x81_100
## 1: 5000 a 0 1 0 0
## 2: 5001 b 0 0 1 0
## 3: 5002 c 1 0 0 0
## 4: 5003 d 0 0 0 1
If they are not unique, have you considered going in the other direction, augmenting df1 with the information in df2? Each county name has a unique fips code (that is, the combination of state and county fips is unique). Something like:
df1[df2[, .N, by=.(fips, county_name)], county:=i.county_name, on=.(x2=fips)]
df1
## x1 mi x2 x21_40 x41_60 x61_80 x81_100 county
## 1: 8000 10 5000 0 1 0 0 a
## 2: 8001 11 5001 0 0 1 0 b
## 3: 8002 12 5002 1 0 0 0 c
## 4: 8003 13 5003 0 0 0 1 d
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | jlhoward |
