'Merging sheets (2) from multiple excel workbooks in one while using information from sheet (1)

I have 200 workbooks (1 per subject) with two sheets each. I was able to merge all sheets 1 and all sheets 2 in two separate data frames using the following script:

#create a list with the data from all the subjects (the folder needs to contain all your data)
my_files <- list.files(pattern="*.lsx")
my_files

#combine the second sheets in one dataset
students_data = lapply(my_files, function(i){
  x = read_excel(i, sheet=2)
  x
})
students_data[[1]]
students_data = do.call("rbind.data.frame", students_data)

However, when merging the second sheet from every workbook I would also like to keep the subjID variable which is only listed in the first sheet. So, my original dataset is like this: Sheet 1 of my dataset Sheet 2 of my dataset What I would like to have is all sheets 2 combined (and I did this with the above code) but with the id of the subject repeated for every 3 rows: Final desired dataset

Do you have any suggestions on how to do this? Thanks a lot for your help!



Solution 1:[1]

You can extract the student ids in Sheet 1 while reading the data from Sheet 2, as follows, this assumes there's a single id for each file.

students_data = lapply(my_files, function(i){
  x = read_excel(i, sheet=2)
  
  # Get sheet 1 for each file
  student_ids = read_excel(2, sheet=1)
  # Get the id 
  SubjID = unique(student_ids["participant_num"])
  # Create a df with the id in a single column, with the same rows
  # as the data frame from sheet 2
  SubjID_df = data.frame("SubjID" = rep(SubjID, nrow(x)))
  
  # Bind the data frame with ids and with the data in
  # sheet 2
  std_data = cbind.data.frame(SubjID_df, x)
  
  std_data
})

Then you can proceed as usual, binding all resulting data frames.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Juan Bosco