'Encoding string labels to integers for hierarchical mutli-class classification

Currently working on a hierarchical classification task, where I give my model some text input and it tells me, which 3 categories it belongs to (main category, sub category, leaf category).

My labels are currently a list of strings, but for my model I need them as integers between 0 to N-1 (N being the number of classes in this category level).

I managed to find a solution by using a dictionary and giving each string a number (encoding) and then another dictionary doing the opposite (decoding) to check what the predictions are.

It works, but I figured once I use another dataset, I would have to generate two dictionaries again. I looked around and found these two classes from the sklearn preprocessing library: MultiLabelBinarizer() and LabelEncoder(). It looks like what I would need, but I have no idea how to use it in combination with the hierarchies.

Would appreciate any kind of hints on how to use them or any other solution!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source