'1 to 2 matching in two dataframes with different sizes in Python/R - second part [duplicate]
please help me with this problem I've been struggling all day lol, solution in either Python or R is fine! Please help I'm really stuck!!!
I have two dataframes - df1 has 44 rows, df2 has 100 rows, they both have these columns: ID, status (0,1), Age, Gender, Race, Ethnicity, Height, Weight
for each row in df1, I need to find an age match in df2:
- it can be exact age match, but the criteria should be used is - df2[age]-5 <= df1[age]<= df2[age]+5
- I need a list/dictionary to store which are the age matches for df1, and their IDs
- Then I need to randomly select 2 IDs from df2 as the final match for df1 age
- I also need to make sure the 2 df2 matches shares the same gender and race as df1
- if the 2 matches in df2 are already used, they need to be eliminated
I have tried R and Python, and both stuck on the nested loops part. I'm not sure how to loop through each record both df1 and df2, compare df1 age with df2 age-5 and df2 age+5, and store the matches
Here are the sample data format for df1 and df2: | ID | sex | age | race | | -------- | -------------- |--------|-------| | 284336 | female | 42.8 | 2 | | 294123 | male | 48.5 | 1 |
Here is what I've attempted in R:
id_match <- NULL
for (i in 1:nrow(gwi_case)){
age <- gwi_case$age[i]
gender <- gwi_case$gender[i]
ethnicity <- gwi_case$hispanic_non[i]
race <- gwi_case$race[i]
x <- which(gwi_control$gender==gender & gwi_control$age>=age-5 & gwi_control$age<=age+5 & gwi_control$hispanic_non==ethnicity & gwi_control$race==race)
y <- sample(x, min(2, length(x)))
id_match <- c(id_match, y)
}
id_match <- id_match[!duplicated(id_match)]
length(id_match)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
