'Efficient way to get the set of items from the same index in multiple lists

I am implementing a decision tree which works on both categorical and numerical attributes. For the categorical data, I would like to only split the tree based on "Is it [category] or is it not" to find the best category to choose. For example, if you have an attribute "Flavor" with categories "vanilla, chocolate, strawberry" then the split decision node would be "is it strawberry?"

To this end, my dataset is huge and I would like to prevent a lot of looping to find the set of categories for each attribute. Right now each sample is a dictionary of {attribute:category} pairs. If I have thousands of samples, I would need to go through each one and make a set of all unique categories for that attribute.

Is there a neat, quick, Pythonic way to do this?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Efficient way to get the set of items from the same index in multiple lists

Sources

Related Questions