'find the unqiue substring pattern in a list of string with python
I have a list of strings as below:
['/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt',
I hope to extract the unique subject id with the pattern 'sub-???_S_????' in the list.
So far I can do it with:
unique_subject = re.search('(.*)_sub-(.*)-ses(.*).txt', all_files[0]).group(2)
But that only works for a single string. I need to do it with a loop.
unique_subject = set()
for f in all_files:
unique_subject.add(re.search('(.*)_sub-(.*)-ses(.*).txt', f).group(2))
I am wondering if there are better ways to do this. Finally I would like to get the first session for each subject. Is there a fast way to do that?
Solution 1:[1]
Try using this:
l = re.findall('\d{3}_S_\d{4}', ''.join(all_files))
Solution 2:[2]
You can use the same regular expression (I added a hyphen in the session portion) and change the set to a dict with key/value of subject/first session. Given your desire to treat the first row of each subject differently, I think your current approach using a loop over list elements is fine.
all_files = [
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt'
]
import re
unique_subject = {}
for f in all_files:
groups = re.search('(.*)_sub-(.*)-ses-(.*).txt', f)
subject = groups.group(2)
if subject not in unique_subject:
session = groups.group(3)
unique_subject[subject] = session
[print(f"{k} : {v}") for k, v in unique_subject.items()]
Output:
130_S_4817 : 2018-05-04_14_33_33.0
141_S_0767 : 2019-04-08_12_52_36.0
041_S_5097 : 2019-05-07_09_56_14.0
068_S_4061 : 2017-09-26_14_07_37.0
002_S_1280 : 2017-03-13_13_38_31.0
082_S_5282 : 2019-06-17_10_11_15.0
018_S_4399 : 2019-08-06_13_03_58.0
123_S_0106 : 2018-10-11_12_54_59.0
141_S_2333 : 2018-12-26_15_31_55.0
031_S_2018 : 2019-01-24_11_26_13.0
041_S_0679 : 2017-07-05_09_46_36.0
037_S_0303 : 2017-05-11_13_39_46.0
037_S_0454 : 2017-09-06_09_41_25.0
068_S_2187 : 2019-10-09_13_19_17.0
116_S_4043 : 2018-03-02_10_03_10.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | lemon |
| Solution 2 | constantstranger |
