'Use text features as a multilabel problem

Introducing as data structure the following data frame:

df <- data.frame(text = c("The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested.","Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.","Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries."), label1 = c(1,1,1), label2 = c(0,1,0), label3 = c(1,0,0))

Columns with names label1, label2 and label3 are some kind of categories and 1 means the text from column text belongs to this category.

Using text features from text features in combination with the label columns how it could be possible to introduce a new dataframe which contains the text column and use a multilable algorithm to categorize the new data into the label columns?

Example new data frame testing set:

dfnew <- data.frame(text = c("Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.","There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable."))


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source