'To search the column title in the value (pos_tag value) of dataframe python and if it's matched put 1 else 0 as column value

Dear Python community,

I am writing a python script to perform prediction using Naive Bayes, SVM, and Decision Tree supervised learning. I already completed all the data preprocessing until getting the prediction from the data that I have.

However, there is a need to add a few new columns (name, value) in the data frame as below.
enter image description here

My issue is I need to check if the column name (e.g. excellent, strip, male) exists in the pos_tag_noun's value, the value in those new columns set to 1, else put 0 as shown below.

enter image description here

I have been working for two days to resolve this issue but still not able to have a solution for it.

Really appreciated for help if any idea or solution to resolve my issue.

Thanks & Regards



Solution 1:[1]

This should work okay. The idea is to explode your values, get_dummies, and then concat back into your dataframe.

pd.concat([df, pd.get_dummies(df['pos_tag_noun'].apply(lambda x: [item[0] for item in x]).explode()).groupby(level=-1).max()], axis=1)

                            pos_tag_noun  bath  excellent  hang  male  strip
0                      [(excellent, NN)]     0          1     0     0      0
1  [(strip, NN), (bath, NN), (hang, NN)]     1          0     1     0      1
2                           [(male, NN)]     0          0     0     1      0

If you don't want to use a lambda function + explode, you can replace it with something like pd.DataFrame(df['pos_tag_noun'].to_list()).stack().

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1