'Extract seasons and years from a string column in pandas

I just wondering if there is any other way I can extract the year from a column and assign two new columns to it where one column is for season and one for year?

I tried this method and it seems to work, but only work for year and selected rows:

year = df['premiered'].str.findall('(\d{4})').str.get(0)
df1 = df.assign(year = year.values)

Output:

|premiered||year|
|----------||---|
|Spring 1998||1998|
|Spring 2001||2001|
|Fall 2016||NaN|
|Fall 2016||NaN|


Solution 1:[1]

Use Series.str.split with the expand option:

expand: Expand the split strings into separate columns.

df[['season', 'year']] = df['premiered'].str.split(expand=True)

#      premiered  season  year
# 0  Spring 1998  Spring  1998
# 1  Spring 2001  Spring  2001
# 2    Fall 2016    Fall  2016
# 3    Fall 2016    Fall  2016

Or use Series.str.extract with a regex:

  • (\w+) -- capture 1+ word characters
  • \s* -- 0+ whitespaces
  • (\d+) -- capture 1+ digits
df[['season', 'year']] = df['premiered'].str.extract('(\w+)\s*(\d+)')

#      premiered  season  year
# 0  Spring 1998  Spring  1998
# 1  Spring 2001  Spring  2001
# 2    Fall 2016    Fall  2016
# 3    Fall 2016    Fall  2016

Also it would be a good idea to convert the new year column to numeric:

df['year'] = df['year'].astype(int)

Solution 2:[2]

You could use a split function

data = { 'premiered' : ['Spring 1998', 'Spring 2001', 'Fall 2016', 'Fall 2016']}
df = pd.DataFrame(data)
df['year'] = df['premiered'].apply(lambda x : x.split(' ')[1])
df

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 ArchAngelPwn