'Compute ratio and fill pandas column with list of unique values for each group
I have a dataframe like as shown below
ID,desk_id,dateval,
1,123,21/11/2016
1,123,29/11/2016
1,456,21/12/2016
1,100,29/12/2016
2,318,11/12/2017
2,419,17/12/2017
2,nan,21/12/2017
2,nan,21/12/2017
2,393,28/11/2017
3,nan,21/11/2016
3,nan,21/11/2016
3,nan,21/11/2016
4,nan,11/8/2018
4,nan,16/8/2018
4,nan,21/8/2018
df = pd.read_clipboard(sep=',')
df['date_val'] = pd.to_datetime(df['date_val'])
df.sort_values(by=['ID','dateval'],inplace=True)
I would like to create three columns
a) desk_id_ratio = count(desk_id <> NA)/Total no of records for each ID. For ex: ID = 2 will have desk_id_ratio=60 because 3 out of 5 rows have desk_id <> NA
b) desk_id_list = store the unique desk_ids for each ID in a list.
c) avg_gap_days = compute the average difference in days between records for each ID. Hence, I sort the dataframe by dateval above to get positive value for mean difference in days
I was trying something like below
df['desk_ratio'] = (df.groupby(['ID'])['desk_id'].count().reset_index(drop=True)/df.groupby(['ID']).size().reset_index(drop=True))*100
df['previous_record'] = df.groupby(['ID'])['dateval'].shift()
df['days_bw_records'] = df['dateval'] - df['previous_record']
df['days_bw_records'] = df['days_bw_records'].apply(lambda x: x.days)
df['avg_gap_days'] = df.groupby(['ID'])['days_bw_records'].agg('mean').reset_index()
I expect my output to be like as below
Solution 1:[1]
You can first assign a column that flags non-NaN values in desk_id; then use groupby.agg to find the desired columns for each column. For desk_ratio, find the mean; for date_val, find the average gap in days; for desk_id, join the unique IDs.
out = (df.assign(desk_ratio=df['desk_id'].notna())
.groupby('ID').agg({'desk_ratio':'mean',
'date_val':lambda x: round(x.diff().dt.days.mean(), 2),
'desk_id':lambda x: ';'.join(str(int(el)) for el in x.unique() if el==el)})
.mul([100,1,1]).reset_index())
Output:
ID desk_ratio date_val desk_id
0 1 100.0 12.67 123;456;100
1 2 60.0 9.75 318;393;419
2 3 0.0 0.00
3 4 0.0 42.00
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

