'Adding rows based on column value
Data frame--->with only columns ['234','apple','banana','orange']
now i have a list like
l=['apple', 'banana']
extracting from another data frame column I am taking unique values of columns from column fruits. fruits.unique() which results in array[()] to get the list of items simply looping over index values and store them in list
loop over the list to check whether the values in the list are presented in columns of data frame. If present,then add 1 for the values that match column headers else add 0 for one that matching. In the above case data frame after matching should look like:
234 apple banana orange
0 1 1 0
Solution 1:[1]
If need one row DataFrame compare columns names converted to DataFrame by Index.to_frame with DataFrame.isin, then for mapping True, False to 1,0 convert to integers and transpose:
df = pd.DataFrame(columns=['234','apple','banana','orange'])
l=['apple', 'banana']
df = df.columns.to_frame().isin(l).astype(int).T
print (df)
234 apple banana orange
0 0 1 1 0
If it is nested list use MultiLabelBinarizer:
df = pd.DataFrame(columns=['234','apple','banana','orange'])
L= [['apple', 'banana'], ['apple', 'orange', 'apple']]
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(L),columns=mlb.classes_)
.reindex(df.columns, fill_value=0, axis=1))
print (df)
234 apple banana orange
0 0 1 1 0
1 0 1 0 1
EDIT: If data are from another DataFrame column solution is very similar like second one:
df = pd.DataFrame(columns=['234','apple','banana','orange'])
df1 = pd.DataFrame({"col":[['apple', 'banana'],['apple', 'orange', 'apple']]})
print (df1)
col
0 [apple, banana]
1 [apple, orange, apple]
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(df1['col']),columns=mlb.classes_)
.reindex(df.columns, fill_value=0, axis=1))
print (df)
234 apple banana orange
0 0 1 1 0
1 0 1 0 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
