'Access a pattern inside a complex name
I need to assign to all entries of a list (L) a special acronym they contain that is delimited by two underscore symbols and build a list out of them (S). An excerpt is below.
L = ['13058_8pcw_Ocx_M', '13058_8pcw_M1C-S1C_M' , '13058_8pcw_AMY_F']
S = ['Ocx','M1C-S1C','AMY']
I made many tries including regular expressions but I failed to get what I want. Ideally one would build a loop to create the list S and use regular expressions to access the particular acronym in L.
Can somebody provide some support or a solution ? Thanks.
Solution 1:[1]
You can use re.findall.
NB The second element has 2 matches for _[^_]+_ so we have only taken the last one, as per your expected output.
import re
L = ['13058_8pcw_Ocx_M', '13058_8pcw_M1C-S1C_M' , '13058_8pcw_AMY_M']
S = []
for l in L:
S.append(re.findall(r'_([^_]+)_[^_]*$',l)[0])
print(S)
output
['Ocx', 'M1C-S1C', 'AMY']
The regex pattern is looking for any character except underscore (^ for not) between 2 underscores. The square brackets indicate a class and the + means that it is repeated 1 or more times.
The part within the brackets (...) is the capturing group, which is returned by the function findall as a list so we use [0] to get the first element.
After _([^_]+)_ we have [^_]*$ which is any number * of any character except underscore before the end of the string $.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
