'How to find the .describe() based on a a column duplicate criteria?
For example, my dataframe consists of
Date | ID | Result |
---|---|---|
1/5/2020 | B213 | 60 |
1/6/2020 | B213 | 70 |
1/5/2021 | B213 | 50 |
1/9/2020 | L914 | 75 |
1/9/2021 | L914 | 76 |
and i want to find out every year what is the mean, median, percentiles of the IDs individually. How do i do that? I used .describe() but I realised it is to collate the entire dataframe and it gives me the percentiles and mean median of the entire dataframe but that's not what I need. (sorry for my bad English. English isn't my first language.) I'm using pandas jupyter.
Solution 1:[1]
first is it a way to format your dataframe :
df['year']=df['Date'].str[-4:]
df_mean=df.groupby(by=['ID','year']).mean().reset_index().rename(columns={'Result':'mean'})
df_median=df.groupby(by=['ID','year']).median().reset_index().rename(columns={'Result':'median'})
df_mean=df_mean.join(df_median['median'])
df_mean
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DataSciRookie |