'Creating subsets on multiple features in python for segmentation
I want to segment a dataset containing items (labeled with IDs), and multiple categorical features that take different values (for instance, color takes 'blue', 'orange', 'green'; size takes 'S', 'M', 'L', brand takes 'Brand A', 'Brand B', etc.):
ID | Brand | Color | Size | Price |
---|---|---|---|---|
1 | Brand 1 | Orange | S | 23 |
2 | Brand 2 | Blue | XXL | 3 |
3 | Brand 1 | Green | XXXL | 45 |
4 | Brand 2 | Blue | M | 200 |
I can easily do it by hand for 1 or 2 features (with a small number of values). E.G. if I segment by brand I get:
ID | Brand | Color | Size | Price |
---|---|---|---|---|
1 | Brand 1 | Orange | S | 23 |
3 | Brand 1 | Green | XXXL | 45 |
and
ID | Brand | Color | Size | Price |
---|---|---|---|---|
2 | Brand 2 | Blue | XXL | 3 |
4 | Brand 2 | Blue | M | 200 |
Unfortunately, some features take 10+ values. Moreover, the number of subsets explodes if I want to segment according to more than 1 feature for segmentation. I am trying to test different levels of segmentation (e.g. color + brand, color+brand+size) which is why I don't do it by hand.
I am trying to figure out a function that take the dataframe and a list of features in input and that output all the different subsets but for now, my code is worthless.
Thank you in advance if you think you can help me!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|