'Filtering strings based on replicated numerical substrings at specific places in the string (in R)
I have a list of file names/paths and I want to filter out the ones where the filename begins with the same six digits that are found after the first "/" in the path. So for example, in the below list, numbers [1], [2], and [6] would be retained, whereas numbers [3], [4], and [5] would be removed from the new list. I'm imagining it should be possible to split each string at the "/"s and compare the first six digits of the 2nd split with the last split, but I'm not sure how to implement this. Any suggestions would be appreciated.
tail(processed_ARL_list)
[1] "220204/220204 2022-02-04 09-32-30/ARL2200660.D/ARL2200660.pdf"
[2] "220204/220204 2022-02-04 09-32-30/ARL2200661.D/ARL2200661.pdf"
[3] "220204/220204 2022-02-04 09-32-30/REFTTO220204_.D/220204 2022-02-04 09-32-30_REFTTO220204_.pdf"
[4] "220207/220204 2022-02-07 12-51-02/REFTTO220207_.D/220204 2022-02-07 12-51-02_REFTTO220207_.pdf"
[5] "220207/220204 2022-02-07 12-51-02/SREF0186 METHYL EUGENOL.D/220204 2022-02-07 12-51-02_SREF0186 METHYL EUGENOL.pdf"
[6] "220207/220204 2022-02-07 12-51-02/SREF0186 METHYL EUGENOL.D/SREF0186 METHYL EUGENOL.pdf"
Solution 1:[1]
So I got the result I was after using this looping method. I feel like there might be a better way, but this will do for now.
processed_results<-c()
for (i in c(1:length(ARL_list))){
filepath_split<-str_split(ARL_list[i],pattern="/")
if(substr(unlist(filepath_split)[2],1,6)!=substr(unlist(filepath_split)[length(unlist(filepath_split))],1,6)){
processed_results[i]<-TRUE
} else {
processed_results[i]<-FALSE
}
}
processed_ARL_list<-ARL_list[processed_results]
Output
tail(processed_ARL_list)
[1] "220128/220128 2022-01-28 07-53-13/ARL2200536.D/ARL2200536.pdf"
[2] "220128/220128 2022-01-28 07-53-13/ARL2200537.D/ARL2200537.pdf"
[3] "220128/220131 2022-01-31 16-10-36/REFTTO220131_.D/REFTTO220131_.pdf"
[4] "220204/220204 2022-02-04 09-32-30/ARL2200660.D/ARL2200660.pdf"
[5] "220204/220204 2022-02-04 09-32-30/ARL2200661.D/ARL2200661.pdf"
[6] "220207/220204 2022-02-07 12-51-02/SREF0186 METHYL EUGENOL.D/SREF0186 METHYL EUGENOL.pdf"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | benson23 |
