'I am attempting to use pandas to turn a long dataset into a wide dataset, but I want to keep repeated index values on separate rows

I am starting with a dataset that looks like: (updated: added Qstn Resp TS to help match up Resp Value to Qstn Title.

longDF = pd.DataFrame({'id':[1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3],
                        'Qstn Title':['Date Services were P','Date of Request','Services Requested','Type of Oral Service','Date Services were P','Date of Request','Describe how to resolve','Services Requested','Type of Oral Service','Date Services were P','Date Services were P','Date Services were P','Date Services were P','Date of Request','Date of Request','Describe how to resolve','Services Requested','Services Requested','Services Requested','Type of Oral Service','Type of Oral Service','Type of Oral Service'],
                        'Resp Value':['05/01/2020','05/01/2020','Chinese (Cantonese)','Telephone Interpreter','07/31/2020','07/31/2020','services were provided','Chinese (Cantonese)','Telephone Interpreter','09/24/2020','09/24/2020','11/19/2020','09/24/2020','09/24/2020','11/19/2020','interpreter lm on vm','Vietnamese','Vietnamese','Vietnamese','Telephone Interpreter','Telephone Interpreter','Telephone Interpreter'],
                        'Qstn Resp TS':['5/1/2020','5/1/2020','5/1/2020','5/1/2020','7/31/2020','7/31/2020','7/31/2020','7/31/2020','7/31/2020','9/24/2020','9/24/2020','11/19/2020','9/24/2020','9/24/2020','11/19/2020','11/19/2020','9/24/2020','9/24/2020','11/19/2020','9/24/2020','9/24/2020','11/19/2020']})

To create the wide dataset I do:

wideDF = pd.pivot_table(longDF, values='Resp Value',  index=['id'], columns='Qstn Title', aggfunc=np.sum)

My goal is to produce a single row for each 'ID' and 'Qstn Title' set where column name='Qstn Title' and values='Resp Value', so wideDF should have 6 rows. When I attempt using pivot_table command above I only get 3 rows. The 'Qstn Title' columns will have multiple 'Resp Values' for 'ID' 3 because of the aggfunc=np.sum.

Expected output:

wideDFout = pd.DataFrame({'id':[1,2,3,3,3,3],
                          'Date Services were P':['05/01/2020','07/31/2020','09/24/2020','09/24/2020','09/24/2020','11/19/2020'],
                          'Date of Request':['05/01/2020','07/31/2020','09/24/2020','','','11/19/2020'],
                          'Services Requested':['Chinese (Cantonese)','Chinese (Cantonese)','Vietnamese','Vietnamese','','Vietnamese'],
                          'Type of Oral Service':['Telephone Interpreter','Chinese (Cantonese)','Telephone Interpreter','Telephone Interpreter','','Telephone Interpreter'],
                          'Describe how to resolve':['','services were provided','','interpreter lm on vm','','']})

Is there a way to go from long to wide while preserving the index/rows for a set of column values?

Below is corrected input dataframe and script to produce desired output:

longDF2 = pd.DataFrame({'id':[1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3],
                        'Qstn Title':['Date Services were P','Date of Request','Services Requested','Type of Oral Service','Date Services were P','Date of Request','Describe how to resolve','Services Requested','Type of Oral Service','Date Services were P','Date Services were P','Date Services were P','Date of Request','Date of Request','Date of Request','Describe how to resolve','Services Requested','Services Requested','Services Requested','Type of Oral Service','Type of Oral Service','Type of Oral Service'],
                        'Resp Value':['05/01/2020','05/01/2020','Chinese (Cantonese)','Telephone Interpreter','07/31/2020','07/31/2020','services were provided','Chinese (Cantonese)','Telephone Interpreter','09/24/2020','09/24/2020','11/19/2020','09/24/2020','09/24/2020','11/19/2020','interpreter lm on vm','Vietnamese','Vietnamese','Vietnamese','Telephone Interpreter','Telephone Interpreter','Telephone Interpreter'],
                        'Qstn Resp TS':['5/1/2020','5/1/2020','5/1/2020','5/1/2020','7/31/2020','7/31/2020','7/31/2020','7/31/2020','7/31/2020','9/24/2020','9/24/2020','11/19/2020','9/24/2020','9/24/2020','11/19/2020','11/19/2020','9/24/2020','9/24/2020','11/19/2020','9/24/2020','9/24/2020','11/19/2020']})

# add temp column 'q' to hold sub-id for each repeated id
longDF2['q'] = longDF2.groupby(['id','Qstn Title','Qstn Resp TS'], group_keys = False).cumcount()

# create multiIndex dataframe based on id, sub-id and qestion title
# and then unstack it
longDFOut2 = longDF2.set_index(['id','q','Qstn Title','Qstn Resp TS']).unstack(level=2, fill_value='')


#### re-pivot ##########
longDFOutNew2 = pd.DataFrame(longDFOut2.to_records())

# rename columns        
for c in range(len(longDFOutNew2.columns)):
    #print('index='+str(c)+' - name='+longDFOutNew2.columns[c])
    if c > 2:
        longDFOutNew2.rename(columns={longDFOutNew2.columns[c]:longDFOutNew2.columns[c].split(',')[1].strip()[1:-2]}, inplace=True)

# pad ID with zeros
longDFOutNew2['id'] = longDFOutNew2['id'].astype(str).str.zfill(10)

# output cleaned dataframe
longDFOutFinal2 = longDFOutNew2[['id','Date Services were P','Date of Request','Describe how to resolve','Services Requested','Type of Oral Service']]

pandas pivot-table

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'I am attempting to use pandas to turn a long dataset into a wide dataset, but I want to keep repeated index values on separate rows

Sources

Related Questions