'Find a pattern in middle of multiple sentences

I have a dataframe as below,

data = [
    [ 1, 'AR-123456' ],
    [ 1, '123456' ],
    [ 2, '345678' ],
    [ 3,'Application-12345678901'],
    [ 3, '12345678901']
]
df = pd.DataFrame(data, columns=['Case', 'ID'] )

Case	ID
1	AR-123456
1	123456
2	345678
3	Application-12345678901
3	12345678901

So basically I am trying to remove rows where for the same Case, the IDs are digits from AR- or Application- i.e., the final expected output :-

Case	ID
1	AR-123456
2	345678
3	Application-12345678901

Solution 1:^[1]

Extract the digits and drop_duplicates:

df["digits"] = df["ID"].str.extract("(\d+)")
output = df.drop_duplicates(["Case","digits"]).drop("digits",axis=1)

>>> output
   Case                       ID
0     1                AR-123456
2     2                   345678
3     3  Application-12345678901

Solution 2:^[2]

You can groupby Case column and then drop the duplicated ID rows that is in AR/Application item

out = (df.groupby('Case')
       .apply(lambda g: g[~g['ID'].isin(g['ID'].str.extract('(AR|Application)-(\d+)')[1])])
       .reset_index(drop=True))

print(out)

   Case                       ID
0     1                AR-123456
1     2                   345678
2     3  Application-12345678901

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	not_speshal
Solution 2	Ynjxsjmh

'Find a pattern in middle of multiple sentences

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]