'Pandas: extract all regex matches from column, join with delimiter

I need to extract all matches from a string in a column and populate a second column. The matches will be delimited by a comma.

df2 = pd.DataFrame([[1000, 'Jerry', 'string of text BR1001_BR1003_BR9009 more string','BR1003',''], 
                [1001, '', 'BR1010_BR1011 random text', 'BR1010',''], 
                ['', '', 'test to discardBR3009', 'BR2002',''],
                [1003, 'Perry','BR4009 pure gibberish','BR1001',''],
                [1004, 'Perry2','','BR1001','']],
               columns=['ID', 'Name', 'REGEX string', 'Member of','Status'])

Pattern representing the codes to be extracted.

BR_pat = re.compile(r'(BR[0-9]{4})', re.IGNORECASE)

Hoped for output in column

BR1001, BR1003, BR9009
BR1010,BR1011
BR3009
BR4009

My attempt:

df2['REGEX string'].str.extractall(BR_pat).unstack().fillna('').apply(lambda x: ", ".join(x))

Output:

 match
0  0        BR1001, BR1010, BR3009, BR4009
   1                    BR1003, BR1011, , 
   2                          BR9009, , ,

There are extra commas and rows missing. What did I do wrong?

Solution 1:^[1]

You can also

add axis=1 to apply to use columns instead of rows.
add filter(None,x) to filter out empty strings.

The result is

df['REGEX string'].str.extractall(BR_pat).unstack().fillna('').apply(lambda x : ",".join(filter(None,x)), axis=1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Klaus78

'Pandas: extract all regex matches from column, join with delimiter

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]