'Access a pattern inside a complex name

I need to assign to all entries of a list (L) a special acronym they contain that is delimited by two underscore symbols and build a list out of them (S). An excerpt is below.

L = ['13058_8pcw_Ocx_M', '13058_8pcw_M1C-S1C_M' , '13058_8pcw_AMY_F']
S = ['Ocx','M1C-S1C','AMY']

I made many tries including regular expressions but I failed to get what I want. Ideally one would build a loop to create the list S and use regular expressions to access the particular acronym in L.

Can somebody provide some support or a solution ? Thanks.



Solution 1:[1]

You can use re.findall.
NB The second element has 2 matches for _[^_]+_ so we have only taken the last one, as per your expected output.

import re
L = ['13058_8pcw_Ocx_M', '13058_8pcw_M1C-S1C_M' , '13058_8pcw_AMY_M']
S = []
for l in L:
  S.append(re.findall(r'_([^_]+)_[^_]*$',l)[0])

print(S)

output

['Ocx', 'M1C-S1C', 'AMY']

The regex pattern is looking for any character except underscore (^ for not) between 2 underscores. The square brackets indicate a class and the + means that it is repeated 1 or more times.
The part within the brackets (...) is the capturing group, which is returned by the function findall as a list so we use [0] to get the first element.
After _([^_]+)_ we have [^_]*$ which is any number * of any character except underscore before the end of the string $.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1