'Splitting a column into multiple using regular expression
I have the following table
df = pd.DataFrame({'favs':{0:'chicken_panfry1_t360_ketchup',
1:'chicken_bake2_t450_out_bbq',
2:'chicken_boiled2_season_gravy'}})
That looks like this
favs
0 chicken_panfry1_t360_ketchup
1 chicken_bake2_t450_out_bbq
2 chicken_boiled2_season_gravy
I would like to split the column at the last underscore to create 2 new columns that look like this.
favs recipe sauce
0 chicken_panfry1_t360_ketchup chicken_panfry1_t360 ketchup
1 chicken_bake2_t450_out_bbq chicken_bake2_t450_out bbq
2 chicken_boiled2_season_gravy chicken_boiled2_season gravy
This is what I've tried
df[['recipe','sauce']]=df['favs'].str.split(r'.*_', expand=True)
This creates the sauce column correctly but the recipe column is blank. It looks like this. Unsure of how to correct it.
favs recipe sauce
0 chicken_panfry1_t360_ketchup ketchup
1 chicken_bake2_t450_out_bbq bbq
2 chicken_boiled2_season_gravy gravy
Solution 1:[1]
You need Series.str.extract with the (.*)_(.*) regex pattern:
df[['recipe','sauce']]=df['favs'].str.extract(r'(.*)_(.*)', expand=True)
See the regex demo.
The (.*)_(.*) regex matches and captures the part before the last _ into Group 1 (with the first (.*)) and the part after last _ into the second column (with the second (.*)).
Solution 2:[2]
No need for a regex, you can simply rsplit limiting to 1 split:
df[['recipe','sauce']] = df['favs'].str.rsplit('_', n=1, expand=True)
output:
favs recipe sauce
0 chicken_panfry1_t360_ketchup chicken_panfry1_t360 ketchup
1 chicken_bake2_t450_out_bbq chicken_bake2_t450_out bbq
2 chicken_boiled2_season_gravy chicken_boiled2_season gravy
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
| Solution 2 | mozway |
