'Get All Row Values After Split and Put Them In List

UPDATED: I've the following DataFrame:

df = pd.DataFrame({'sports': ["['soccer', 'men tennis']", "['soccer']", "['baseball', 'women tennis']"]})

print(df)
                         sports
0      ['soccer', 'men tennis']
1                    ['soccer']
2  ['baseball', 'women tennis']

I need to extract all the unique sport names and put them into a list. I'm trying the following code:

out = pd.DataFrame(df['sports'].str.split(',').tolist()).stack()
out.value_counts().index

However, it's returning Nan values.

Desired output:

['soccer', 'men tennis', 'baseball', 'women tennis']

What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks!



Solution 1:[1]

If these are lists, then you could explode + unique:

out = df['sports'].explode().unique().tolist()

If these are strings, then you could use ast.literal_eval first to parse it:

import ast
out = df['sports'].apply(ast.literal_eval).explode().unique().tolist()

or use ast.literal_eval in a set comprehension and unpack:

out = [*{x for lst in df['sports'].tolist() for x in ast.literal_eval(lst)}]

Output:

['soccer', 'men tennis', 'baseball', 'women tennis']

Solution 2:[2]

Assuming the type of values stored in sports column is list, we can flatten the column using hstack, then use set to get unique values

set(np.hstack(df['sports']))

{'baseball', 'men tennis', 'soccer', 'women tennis'}

Solution 3:[3]

lst = []
df['sports'].apply(lambda x: [lst.append(element) for element in x])
lst = list(set(lst))

Not sure how efficient is this, but works.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Shubham Sharma
Solution 3 r.uzunok