'Remove leading strings in a dataframe
I am studying other people's code df and I face a similar problem to this where the data is joined whatsoever:
Names
--------
NurseJohn
SoldierJohn
TeacherJohn
DriverJohn
CEOJohn
How can I remove the words before John?
It can be removed like this but I don't understand how it was removed
df['Names'] = df['Names'].str.replace(".*(?=John)", "", regex=True)
Can someone explain to us what happened in (".*(?=John)", "", regex=True)? and with that, is there other way to do this straightforwardly?
Solution 1:[1]
Actually, the regex pattern you should have used is:
.*(?=John$)
This pattern says to match all content, greedily, until hitting the content John at the very end of the Names column. Note that it does not consume John, it only asserts that it follows, before stopping the match.
Your updated code:
df["Names"] = df["Names"].str.replace(r'.*(?=John$)', '')
Solution 2:[2]
ya so...your using regex...regex is a tool ever lang ive worked with uses to search strings(text). Regex = Regular Expression. next you are using regex to exclude anything before "John", then replace with "" witch is an empty string.
so to read it from left to right:
- call dataframe col 'Names'
- for string in col, replace ALL(*) before "John" with empty string(""), using regex
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Biegeleisen |
| Solution 2 |
