'Merging sheets (2) from multiple excel workbooks in one while using information from sheet (1)
I have 200 workbooks (1 per subject) with two sheets each. I was able to merge all sheets 1 and all sheets 2 in two separate data frames using the following script:
#create a list with the data from all the subjects (the folder needs to contain all your data)
my_files <- list.files(pattern="*.lsx")
my_files
#combine the second sheets in one dataset
students_data = lapply(my_files, function(i){
x = read_excel(i, sheet=2)
x
})
students_data[[1]]
students_data = do.call("rbind.data.frame", students_data)
However, when merging the second sheet from every workbook I would also like to keep the subjID variable which is only listed in the first sheet. So, my original dataset is like this: Sheet 1 of my dataset Sheet 2 of my dataset What I would like to have is all sheets 2 combined (and I did this with the above code) but with the id of the subject repeated for every 3 rows: Final desired dataset
Do you have any suggestions on how to do this? Thanks a lot for your help!
Solution 1:[1]
You can extract the student ids in Sheet 1 while reading the data from Sheet 2, as follows, this assumes there's a single id for each file.
students_data = lapply(my_files, function(i){
x = read_excel(i, sheet=2)
# Get sheet 1 for each file
student_ids = read_excel(2, sheet=1)
# Get the id
SubjID = unique(student_ids["participant_num"])
# Create a df with the id in a single column, with the same rows
# as the data frame from sheet 2
SubjID_df = data.frame("SubjID" = rep(SubjID, nrow(x)))
# Bind the data frame with ids and with the data in
# sheet 2
std_data = cbind.data.frame(SubjID_df, x)
std_data
})
Then you can proceed as usual, binding all resulting data frames.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Juan Bosco |