'Group answers & index based on Questions in python

I have a dataframe as such for analysis purpose, I need to create a list of dictionaries as:

TARGET OUTPUT

[
{ 'is my anti hiv test conclusive or--Bla bla': [0, 1, 2] }, 
{'I have some hip pain 9 weeks--bla bla': [3, 4, 5, 6]} 
]

Here the list is indices of answers and not the actual answers

enter image description here

Well yes, the obvious method is to use groupby but facing some errors

enter image description here

I tried printing before converting to list. And it seems fine actually,

enter image description here

Can y'all please help me figure out it's correct syntax so I could to my targeted output.

Dataset link If somebody needs the shared notebook link, let me know in the comments.



Solution 1:[1]

You need to actually select the column ("index") whose values you want to appear in the list:

df_ans = data.groupby(["question_text"])["index"].apply(list).to_dict()

instead of

df_ans = data.groupby(["question_text"]).apply(list).to_dict()

Otherwise you get a list of the columns, as in your example. That's what happens when you convert a DataFrame to a list, i.e. list(data) gives you the same as list(data.columns) .

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1