'Pandas: Split and/or update columns, based on inconsistent data?
So I have a column that contains baseball team names, and I want to split it into the 2 new columns, that will contain separately city name and team name.
| Team |
|---|
| New York Giants |
| Atlanta Braves |
| Chicago Cubs |
| Chicago White Sox |
I would like to get something like this:
| Team | City | Franchise |
|---|---|---|
| New York Giants | New York | Giants |
| Atlanta Braves | Atlanta | Braves |
| Chicago Cubs | Chicago | Cubs |
| Chicago White Sox | Chicago | White Sox |
What I have tried so far?
- using
splitandrsplit--> it gets the job done, but can't unify it. - did the count
df['cnt'] = df.asc.apply(lambda x: len(str(x).split(' ')))to get number of strings, so I know what kind of cases I have
There are 3 different cases:
- Standard one (e.g. Atlanta Braves)
- City with 2 strings (e.g. New York Giants)
- Franchise name with 2 strings (e.g. Chicago White Sox )
What I would like to do?
- Split based on conditions (
if cnt=2 then split on 1st occurence). Can't find syntax for this, how this would go? - Update based on names (e.g.
if ['Col_name'].str.contains("York" or "Angeles") then split on 2nd occurence. Also, can't find working syntax, example for this?
What would be a good approach to solve this?
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
