'Replace combination of space, hyphen and text or a "by" using regex and pandas
I want to replace a combination of a space, an hyphen, a space and text or the combination "By [Author]". This is my data frame:
my_titles = ['Peter Rabbit - Volume II', 'Who stole my cookie By Cole Pattesh', 'The Stormy Night - Nia Costas']
adf = pd.DataFrame({'my_titles':my_titles})
adf
my_titles
0 Peter Rabbit - Volume II
1 Who stole my cookie By Cole Pattesh
2 The Stormy Night - Nia Costas
My expected df is:
my_titles
0 Peter Rabbit
1 Who stole my cookie
2 The Stormy Night
I have tried this, expecting regex to recognize the '\s' space and the '|' (or):
adf['my_titles'].replace('\s-\s*|\sBy\s*$','',regex=True)
adf
And I tried this too trying to chain the space and words:
adf['my_titles'].replace('[ - \w]|[ By \w]','',regex=True)
adf
Please, do you know what I am doing wrong?
Solution 1:[1]
You can use
import pandas as pd
my_titles = ['Peter Rabbit - Volume II', 'Who stole my cookie By Cole Pattesh', 'The Stormy Night - Nia Costas']
adf = pd.DataFrame({'my_titles':my_titles})
adf['my_titles'] = adf['my_titles'].str.replace(r'\s+(?:-\s+|By\s+[A-Z]).*', '', regex=True)
Ouput of print(adf['my_titles']):
0 Peter Rabbit
1 Who stole my cookie
2 The Stormy Night
See the regex demo. Details:
\s+- one or more whitespaces(?:-\s+|By\s+[A-Z])- a-and one or more whitespaces, orBy, one or more whitespaces, and an uppercase letter.*- the rest of the line.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
