'Last word delete but keep if it is only one
I have a series of texts that has either one word or a combination of words. I need to delete the last word its greater than 1, if not leave the last word.
Have tried the following regex:
df["first_middle_name"] = df["full_name"].replace("\s+\S+$", "")
from this solution: Removing last words in each row in pandas dataframe
It deletes certain words keeps others.
Some examples of strings in my df['Municipio']:
Zacapa
San Luis, **Jalapa**
Antigua Guatemala **Sacatepéquez**
Guatemala
Mixco
Sacapulas, **Jutiapa**
Puerto Barrios, **Izabal**
Petén **Petén**
San Martin Jil, **Chimaltenango**
What I need for example is if it finds one word keeps that word, if it is a combination of more words (2 or more) and there is a comma or space delete the last word. See bold words.
Thank you!
Solution 1:[1]
You can apply
a function to check if ,
in string first, then check space in string.
df['Municipio'] = df['Municipio'].apply(lambda x: ', '.join(x.split(',')[:-1]) if ',' in x
else (' '.join(x.split(' ')[:-1]) if ' ' in x else x))
print(df)
Municipio
0 Zacapa
1 San Luis
2 Antigua Guatemala
3 Guatemala
4 Mixco
5 Sacapulas
6 Puerto Barrios
7 Petén
8 San Martin Jil
If you want to keep the last comma and space
df['Municipio'] = df['Municipio'].apply(lambda x: ', '.join(x.split(',')[:-1]+['']) if ',' in x
else (' '.join(x.split(' ')[:-1]+['']) if ' ' in x else x))
print(df)
Municipio
0 Zacapa
1 San Luis,
2 Antigua Guatemala
3 Guatemala
4 Mixco
5 Sacapulas,
6 Puerto Barrios,
7 Petén
8 San Martin Jil,
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ynjxsjmh |