'Get All Row Values After Split and Put Them In List
UPDATED: I've the following DataFrame:
df = pd.DataFrame({'sports': ["['soccer', 'men tennis']", "['soccer']", "['baseball', 'women tennis']"]})
print(df)
sports
0 ['soccer', 'men tennis']
1 ['soccer']
2 ['baseball', 'women tennis']
I need to extract all the unique sport names and put them into a list. I'm trying the following code:
out = pd.DataFrame(df['sports'].str.split(',').tolist()).stack()
out.value_counts().index
However, it's returning Nan values.
Desired output:
['soccer', 'men tennis', 'baseball', 'women tennis']
What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks!
Solution 1:[1]
If these are lists, then you could explode + unique:
out = df['sports'].explode().unique().tolist()
If these are strings, then you could use ast.literal_eval first to parse it:
import ast
out = df['sports'].apply(ast.literal_eval).explode().unique().tolist()
or use ast.literal_eval in a set comprehension and unpack:
out = [*{x for lst in df['sports'].tolist() for x in ast.literal_eval(lst)}]
Output:
['soccer', 'men tennis', 'baseball', 'women tennis']
Solution 2:[2]
Assuming the type of values stored in sports column is list, we can flatten the column using hstack, then use set to get unique values
set(np.hstack(df['sports']))
{'baseball', 'men tennis', 'soccer', 'women tennis'}
Solution 3:[3]
lst = []
df['sports'].apply(lambda x: [lst.append(element) for element in x])
lst = list(set(lst))
Not sure how efficient is this, but works.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Shubham Sharma |
| Solution 3 | r.uzunok |
