'how to add multiple columns from one data frame to another based on values in another column?

I have two data frames and I want to take the last four columns from df1 and add them to df2 based on the fips code in df2. The fips code is labeled x2 in df1.

x1 =c(8000:8003)
mi =c(10:13)
x2 =c(5000:5003)
x21_40 = c(0,0,1,0)
x41_60 = c(1,0,0,0)
x61_80 = c(0,1,0,0)
x81_100 = c(0,0,0,1)
df1 = data.frame(x1,mi,x2, x21_40,x41_60,x61_80, x81_100)

fips = c(5000:5003)
county_name=c("a","b","c","d")
df2 = data.frame(fips, county_name)

> df1
    x1 mi   x2 x21_40 x41_60 x61_80 x81_100
1 8000 10 5000      0      1      0       0
2 8001 11 5001      0      0      1       0
3 8002 12 5002      1      0      0       0
4 8003 13 5003      0      0      0       1

> df2
  fips county_name
1 5000           a
2 5001           b
3 5002           c
4 5003           d

Desired

  fips county_name x21_40 x41_60 x61_80 x81_100
1 5000           a      0      1      0       0
2 5001           b      0      0      1       0
3 5002           c      1      0      0       0
4 5003           d      0      0      0       1

I don't want to merge both data frames, I only want the last four columns, so I don't think it's as simple as using left_join(). I have used %in% before, but that was only for filtering values in one data frame, based on values in another column from a different data frame. This one has four columns, so I'm not sure what to do.



Solution 1:[1]

We can use left_join if we want to match the 'x2' from 'df1' and 'fips' from 'df2'

library(dplyr)
df2 <- left_join(df2, df1 %>%
                select(x2:last_col()), by = c("fips" = "x2"))

-output

df2
   fips county_name x21_40 x41_60 x61_80 x81_100
1 5000           a      0      1      0       0
2 5001           b      0      0      1       0
3 5002           c      1      0      0       0
4 5003           d      0      0      0       1

In case of duplicates in 'df1', get the max value for those columns grouped by 'fips/x2' and then do the join

df1 %>% 
  group_by(fips = x2) %>%
   summarise(across(x21_40:x81_100, max, na.rm = TRUE), 
      .groups = "drop") %>%
   left_join(df2, .)

Solution 2:[2]

It sounds like you want to augment df2 with information from df1. As long as the fips codes in df1 are unique ( df1$x2 ), this should work correctly.

Using data.table:

setDT(df2); setDT(df1)
xCols <- grep('_', names(df1), value = TRUE)
df2[df1, c(xCols):=mget(paste('i', xCols, sep='.')), on=.(fips=x2)]
df2
##    fips county_name x21_40 x41_60 x61_80 x81_100
## 1: 5000           a      0      1      0       0
## 2: 5001           b      0      0      1       0
## 3: 5002           c      1      0      0       0
## 4: 5003           d      0      0      0       1

If they are not unique, have you considered going in the other direction, augmenting df1 with the information in df2? Each county name has a unique fips code (that is, the combination of state and county fips is unique). Something like:

df1[df2[, .N, by=.(fips, county_name)], county:=i.county_name, on=.(x2=fips)]
df1
##      x1 mi   x2 x21_40 x41_60 x61_80 x81_100 county
## 1: 8000 10 5000      0      1      0       0      a
## 2: 8001 11 5001      0      0      1       0      b
## 3: 8002 12 5002      1      0      0       0      c
## 4: 8003 13 5003      0      0      0       1      d

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 jlhoward