'Convert a python dictionary with sets' values to a binary dataframe

I have a dictionary where the values are sets:

my_dict = {1: {'a', 'b'}, 2: {'a', 'c'}, 3: {'b', 'c', 'd'}, 4: {'a'}}

I would like to convert it to a binary dataframe where the columns are the members of the keys' sets - so for the above example, the output is as follows:

   a b c d
1  1 1 0 0
2  1 0 1 0
3  0 1 1 1 
4  1 0 0 0

How can I do it in an efficient and scalable manner?



Solution 1:[1]

You can use pd.str.get_dummies, like this:

my_dict = {1: {'a', 'b'}, 2: {'a', 'c'}, 3: {'b', 'c', 'd'}, 4: {'a'}}
ser = pd.Series({k: list(v) for k, v in my_dict.items()}).str.join('|').str.get_dummies()
print(ser)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Adam.Er8