'From a string with repeated under scores (e.g. 1_2_3_4_5_6), split and select 3_4
The header of my data frame looks like this
header = list(data_no_control.columns.values)
header
['MLID_D_08_NGS_34_H08.fsa',
'MLID_D_25_NGS_38_A11.fsa',
'MLID_D_36_NGS_41_D12.fsa',
'MLID_D_37_NGS_42_E12.fsa']
I want to change my header to look like this
['NGS_34',
'NGS_38',
'NGS_41',
'NGS_42']
How can I do this?
Solution 1:[1]
header = ['MLID_D_08_NGS_34_H08.fsa',
'MLID_D_25_NGS_38_A11.fsa',
'MLID_D_36_NGS_41_D12.fsa',
'MLID_D_37_NGS_42_E12.fsa']
new_header = []
for item in header:
item = item.split('_')
new_header.append(item[3] + '_' + item[4])
# output: ['NGS_34', 'NGS_38', 'NGS_41', 'NGS_42']
print(new_header)
Solution 2:[2]
Using str.extract:
df["col"] = df["col"].str.extract(r'_([^_]+_[^_]+)_[^_]+\.\w+$')
Here is a regex demo.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Captain Caveman |
| Solution 2 | Tim Biegeleisen |
