'Extract seasons and years from a string column in pandas
I just wondering if there is any other way I can extract the year from a column and assign two new columns to it where one column is for season and one for year?
I tried this method and it seems to work, but only work for year and selected rows:
year = df['premiered'].str.findall('(\d{4})').str.get(0)
df1 = df.assign(year = year.values)
Output:
|premiered||year|
|----------||---|
|Spring 1998||1998|
|Spring 2001||2001|
|Fall 2016||NaN|
|Fall 2016||NaN|
Solution 1:[1]
Use Series.str.split with the expand option:
expand: Expand the split strings into separate columns.
df[['season', 'year']] = df['premiered'].str.split(expand=True)
# premiered season year
# 0 Spring 1998 Spring 1998
# 1 Spring 2001 Spring 2001
# 2 Fall 2016 Fall 2016
# 3 Fall 2016 Fall 2016
Or use Series.str.extract with a regex:
(\w+)-- capture 1+ word characters\s*-- 0+ whitespaces(\d+)-- capture 1+ digits
df[['season', 'year']] = df['premiered'].str.extract('(\w+)\s*(\d+)')
# premiered season year
# 0 Spring 1998 Spring 1998
# 1 Spring 2001 Spring 2001
# 2 Fall 2016 Fall 2016
# 3 Fall 2016 Fall 2016
Also it would be a good idea to convert the new year column to numeric:
df['year'] = df['year'].astype(int)
Solution 2:[2]
You could use a split function
data = { 'premiered' : ['Spring 1998', 'Spring 2001', 'Fall 2016', 'Fall 2016']}
df = pd.DataFrame(data)
df['year'] = df['premiered'].apply(lambda x : x.split(' ')[1])
df
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | ArchAngelPwn |
