'1 to 2 matching in two dataframes with different sizes in Python/R - second part [duplicate]

please help me with this problem I've been struggling all day lol, solution in either Python or R is fine! Please help I'm really stuck!!!

I have two dataframes - df1 has 44 rows, df2 has 100 rows, they both have these columns: ID, status (0,1), Age, Gender, Race, Ethnicity, Height, Weight

for each row in df1, I need to find an age match in df2:

it can be exact age match, but the criteria should be used is - df2[age]-5 <= df1[age]<= df2[age]+5
I need a list/dictionary to store which are the age matches for df1, and their IDs
Then I need to randomly select 2 IDs from df2 as the final match for df1 age
I also need to make sure the 2 df2 matches shares the same gender and race as df1
if the 2 matches in df2 are already used, they need to be eliminated

I have tried R and Python, and both stuck on the nested loops part. I'm not sure how to loop through each record both df1 and df2, compare df1 age with df2 age-5 and df2 age+5, and store the matches

Here are the sample data format for df1 and df2: | ID | sex | age | race | | -------- | -------------- |--------|-------| | 284336 | female | 42.8 | 2 | | 294123 | male | 48.5 | 1 |

Here is what I've attempted in R:

id_match <- NULL
for (i in 1:nrow(gwi_case)){
  age <- gwi_case$age[i]
  gender <- gwi_case$gender[i]
  ethnicity <- gwi_case$hispanic_non[i]
  race <- gwi_case$race[i]
  
  x <- which(gwi_control$gender==gender & gwi_control$age>=age-5 & gwi_control$age<=age+5 & gwi_control$hispanic_non==ethnicity & gwi_control$race==race)
  
  y <- sample(x, min(2, length(x)))
  
  id_match <- c(id_match, y)
}

id_match <- id_match[!duplicated(id_match)]
length(id_match)

python r dataframe matching

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'1 to 2 matching in two dataframes with different sizes in Python/R - second part [duplicate]

Sources

Related Questions