'Combining a pandas DataFrame of lists, into a "one-hot" with values from a column of lists
How do I get something similar to a one-hot encoding, except for 1s and 0s, returning values from a list of the same size as the labels?
As an example DataFrame, with Sklearn's MultiLabelBinarizer I can turn the following DataFrame of lists:
df = pd.DataFrame({'labels': [['A', 'B'], ['B', 'D', 'F'],
['A', 'C', 'D'], ['D', 'B']],
'values': [[13.2, 1.3], [7.9, 12, 3.2],
[9.0, 1, 14.8], [3.1, 6.7]]})
labels values
0 [A, B] [13.2, 1.3]
1 [B, D, F] [7.9, 12, 3.2]
2 [A, C, D] [9.0, 1, 14.8]
3 [D, B] [3.1, 6.7]
mlb = MultiLabelBinarizer(sparse_output=True)
df = df.join(
pd.DataFrame.sparse.from_spmatrix(
mlb.fit_transform(df.pop('labels')),
index=df.index,
columns=mlb.classes_))
To:
values A B C D F
0 [13.2, 1.3] 1 1 0 0 0
1 [7.9, 12, 3.2] 0 1 0 1 1
2 [9.0, 1, 14.8] 1 0 1 1 0
3 [3.1, 6.7] 0 1 0 1 0
While I am looking for a way to get this:
A B C D F
0 13.2 1.3 0 0 0
1 0 7.9 0 12.0 3.2
2 9.0 0 1. 14.8 0
3 0 6.7 0 3.1 0
Solution 1:[1]
Create sample data
df = pd.DataFrame()
df['labels'] = [['A', 'B'], ['B', 'D', 'F'], ['A', 'C', 'D'], ['D', 'B']]
df['values'] = [[13.2, 1.3], [7.9, 12, 3.2], [9.0, 1, 14.8], [3.1, 6.7]]
List comprehension to loop over each row value and combine them to create a dictionary
transform = [dict(zip(label, value)) for label, value in zip(df['labels'], df['values'])]
Put new data into a dataframe
new_df = pd.DataFrame(transform)
Replace nan with 0
new_df.replace(np.nan, 0)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Simon |
