'Split list in a column to multiple columns

I have two DataFrames as below:

df1.shape = (4,2)

Text Topic
Where is the party tonight? Party
Let's dance Party
Hello world Other
It is rainy today Weather

df2.shape(4,2)

0 1
Where is the party tonight? [-0.011570500209927559, -0.010117080062627792,….,0.062448356]
Let's dance [-0.08268199861049652, -0.0016140303341671824,….,0.02094201]
Hello world [-0.0637684240937233, -0.01590338535606861,….,0.02094201]
It is rainy today [0.06379614025354385, -0.02878064103424549,….,0.056790903]

Basically df2 is the embedding of each sentence on the df1 which has a topic associated to it. The embedding is in 'column 1' in df2 which has a string of list of positive or negative integers of size 512.

My desired output DataFrame is:

df_output.shape = (4,514)

Text Topic Feature_0 Feature_2 …. Feature_511
Where is the party tonight? Party -0.0115705 -0.01011708 …. 0.0624484
Let's dance Party -0.082681999 -0.00161403 …. 0.020942
Hello world Other -0.063768424  -0.01590338535606861, …. 0.020942
It is rainy today Weather 0.06379614 -0.028780641 …. 0.056790903

How can I get this done. I was trying to split the embeddings in the DataFrame df2 into columns but it doesn't work for me. This is what I have done so far:

df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))

It just created a duplicate column 1 as feature_0. I haven't even reached to the stage where I can work to join df1 and df2.



Solution 1:[1]

How about this?

import pandas as pd

data = {0: ['Where is the party tonight?', "Let's dance", 'Hello world', 'It is rainy today'],
1:[[-0.011570500209927559, -0.010117080062627792,0.062448356],[-0.08268199861049652, -0.0016140303341671824,0.02094201],[-0.0637684240937233, -0.01590338535606861,0.02094201],[0.06379614025354385, -0.02878064103424549,0.056790903]]}
df = pd.DataFrame(data)

# mind NO values here
exploded_df = pd.DataFrame(df[1].to_list(), index=df.index)

print(exploded_df.head())

Output:

          0         1         2
0 -0.011571 -0.010117  0.062448
1 -0.082682 -0.001614  0.020942
2 -0.063768 -0.015903  0.020942
3  0.063796 -0.028781  0.056791

You could then join both dfs.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 KarelZe