'How to merge two dataframes by columns and alternating the columns that don't match in R
I have to do a D in D analysis and for that I have two dataframes with the exact same colums but with different values (pre treatment and post treatment). In addition, from one period to another I lost some participants so some values won't be taken into account:
Survey1:
| ID | City | Children | Q1 | Q2 |
|---|---|---|---|---|
| 1 | Paris | Yes | 0.5 | Yes |
| 2 | NY | No | 1 | No |
| 3 | London | No | NA | Yes |
| 4 | Madrid | Yes | 2.1 | No |
| 5 | Paris | Yes | 1.8 | Yes |
| 6 | Paris | No | NA | Yes |
| 7 | NY | Yes | 3 | Yes |
| 8 | Madrid | Yes | 0.8 | No |
| 9 | Paris | No | 2.5 | No |
| 10 | Paris | No | 1 | Yes |
Survey 2:
| ID | City | Children | Q1 | Q2 |
|---|---|---|---|---|
| 1 | Paris | Yes | 1 | Yes |
| 3 | London | No | 2 | Yes |
| 4 | Madrid | Yes | 0.5 | Yes |
| 6 | Paris | No | 2 | Yes |
| 7 | NY | Yes | 1.8 | Yes |
| 9 | Paris | Yes | 2.5 | Yes |
| 10 | Paris | No | 1 | No |
As you can see in Survey2 I have lost subjects: 2, 5 and 8 + Subject 9 had a baby meanwhile.
I would like to merge both dataframes by ID, alternating columns and if possible changing the name to make clear the columps pre and post treatment:
Result:
| ID | City | Children | Q1 | Q1_t | Q2 | Q2_t |
|---|---|---|---|---|---|---|
| 1 | Paris | Yes | 0.5 | 1 | Yes | Yes |
| 3 | London | No | NA | 2 | Yes | Yes |
| 4 | Madrid | Yes | 2.1 | 0.5 | No | Yes |
| 6 | Paris | No | NA | 2 | Yes | Yes |
| 7 | NY | Yes | 3 | 1.8 | Yes | Yes |
| 9 | Paris | Yes | 2.5 | 2.5 | No | Yes |
| 10 | Paris | No | 1 | 1 | Yes | No |
When i use merge(Survey1, Survey2, by = "ID") it keeps the correct ID but since subject 9 had a baby it gives me an extra observation so my Result dataframe has 8 obs. instead of only 7 obs. (since I lost 3 subjects). I would like to only take into account the observation once subject 9 had a baby. It also tries to merge some of my questions which as a consequence eliminates my pre and post treatment observations and puts them in the order: Q1, Q2, Q1, Q2 instead of: Q1, Q1, Q2, Q2. Also, I don't know how to merge by adding the "_t" at the end ob the second dataframecolumn names.
Does somebody has an idea?
#For Survey 1
a <- c("1","2","3","4","5","6","7","8","9","10")
b <- c("Paris", "NY", "London", "Madrid", "Paris", "Paris", "NY", "Madrid", "Paris", "Paris")
c <- c("Yes", "No", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No")
d <- c(0.5, 1, NA, 2.1, 1.8, NA, 3, 0.8, 2.5, 1)
e <- c("Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "No", "No", "Yes")
Survey1 <- data.frame(a,b,c,d,e)
names(Survey1) <- c("ID", "City", "Children", "Q1", "Q2")
Survey1
#For Survey2
a <- c("1","3","4","6","7","9","10")
b <- c("Paris", "London", "Madrid", "Paris", "NY", "Paris", "Paris")
c <- c("Yes", "No", "Yes", "No", "Yes", "Yes", "No")
d <- c(1, 2, 0.5, 2, 1.8, 2.5, 1)
e <- c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No")
Survey2 <- data.frame(a,b,c,d,e)
names(Survey2) <- c("ID", "City", "Children", "Q1", "Q2")
Survey2
Solution 1:[1]
Your data is not reproducible, so I haven't tried my solution, but by the looks of your data, it seems you want to do:
library(tidyverse)
results <- survey1 %>%
select(-Children) %>%
right_join(survey2 %>%
rename(Q1_t = Q1,
Q2_t = Q2),
by = c("id", "City")) %>%
relocate(Q1_T, .after = Q1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Fred-LM |
