'Create a pandas DataFrame where each cell is a set of strings

I am trying to create a DataFrame like so:

col_a col_b
{'soln_a'} {'soln_b'}

In case it helps, here are some of my failed attempts:

import pandas as pd

my_dict_a = {"col_a": set(["soln_a"]), "col_b": set("soln_b")}
df_0 = pd.DataFrame.from_dict(my_dict_a) # ValueError: All arrays must be of the same length

df_1 = pd.DataFrame.from_dict(my_dict_a, orient="index").T # splits 'soln_b' into individual letters

my_dict_b = {"col_a": ["soln_a"], "col_b": ["soln_b"]}

df_2 = pd.DataFrame(my_dict_b).apply(set) # TypeError: 'set' type is unordered

df_3 = pd.DataFrame.from_dict(my_dict_b, orient="index").T # creates DataFrame of lists

df_3.apply(set, axis=1) # combines into single set of {soln_a, soln_b}

What's the best way to do this?



Solution 1:[1]

You could apply a list comprehension on the columns:

my_dict_b = {"col_a": ["soln_a"], "col_b": ["soln_b"]}
df_2 = pd.DataFrame(my_dict_b)
df_2 = df_2.apply(lambda col: [set([x]) for x in col])

Output:

      col_a     col_b
0  {soln_a}  {soln_b}

Solution 2:[2]

Why not something like this?

df = pd.DataFrame({
    'col_a': [set(['soln_a'])],
    'col_b': [set(['soln_b'])],
})

Output:

>>> df
      col_a     col_b
0  {soln_a}  {soln_b}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2