'Compute ratio and fill pandas column with list of unique values for each group

I have a dataframe like as shown below

ID,desk_id,dateval,     
1,123,21/11/2016          
1,123,29/11/2016     
1,456,21/12/2016             
1,100,29/12/2016     
2,318,11/12/2017    
2,419,17/12/2017
2,nan,21/12/2017
2,nan,21/12/2017
2,393,28/11/2017
3,nan,21/11/2016
3,nan,21/11/2016
3,nan,21/11/2016
4,nan,11/8/2018
4,nan,16/8/2018
4,nan,21/8/2018

df = pd.read_clipboard(sep=',')
df['date_val'] = pd.to_datetime(df['date_val'])
df.sort_values(by=['ID','dateval'],inplace=True)

I would like to create three columns

a) desk_id_ratio = count(desk_id <> NA)/Total no of records for each ID. For ex: ID = 2 will have desk_id_ratio=60 because 3 out of 5 rows have desk_id <> NA

b) desk_id_list = store the unique desk_ids for each ID in a list.

c) avg_gap_days = compute the average difference in days between records for each ID. Hence, I sort the dataframe by dateval above to get positive value for mean difference in days

I was trying something like below

df['desk_ratio'] = (df.groupby(['ID'])['desk_id'].count().reset_index(drop=True)/df.groupby(['ID']).size().reset_index(drop=True))*100
df['previous_record'] = df.groupby(['ID'])['dateval'].shift()
df['days_bw_records'] = df['dateval'] - df['previous_record']
df['days_bw_records'] = df['days_bw_records'].apply(lambda x: x.days)
df['avg_gap_days'] = df.groupby(['ID'])['days_bw_records'].agg('mean').reset_index()

I expect my output to be like as below

enter image description here



Solution 1:[1]

You can first assign a column that flags non-NaN values in desk_id; then use groupby.agg to find the desired columns for each column. For desk_ratio, find the mean; for date_val, find the average gap in days; for desk_id, join the unique IDs.

out = (df.assign(desk_ratio=df['desk_id'].notna())
       .groupby('ID').agg({'desk_ratio':'mean', 
                           'date_val':lambda x: round(x.diff().dt.days.mean(), 2),
                           'desk_id':lambda x: ';'.join(str(int(el)) for el in x.unique() if el==el)})
       .mul([100,1,1]).reset_index())

Output:

   ID  desk_ratio  date_val      desk_id
0   1       100.0     12.67  123;456;100
1   2        60.0      9.75  318;393;419
2   3         0.0      0.00             
3   4         0.0     42.00             

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1