'Pandas: groupby and then retrieving IQR

I am quite new to Pandas and I am trying to do the following thing:

I have two dataframes comms and arts that look like this (except for the fact they are longer ad with other columns)

comms:

ID    commScore           
10       5                
10       3                  
10      -1                 
11       0                
11       2              
12       9      
13      -2     
13      -1     
13       1      
13       4

arts:

ID    commNumber
10        3 
11        2    
12        1
13        4      

I need to group comms by their ID, and then save in arts (obviously in the correct ID line) the interquartile range (IQR) of the commScore distribution for each ID.

I've already tried using groupby, agg and map , but since my notions of pandas are quite limited, I just couldn't do what I was looking for.

Does anyone have a solution?

Thanks

Andrea



Solution 1:[1]

We can group the dataframe by ID and aggregate column commScore using the function iqr from scipy.stats to calculate inter quartile range, then map this calculated iqr range on the column ID of the arts dataframe

from scipy.stats import iqr

arts['IQR'] = arts['ID'].map(comms.groupby('ID')['commScore'].agg(iqr))

   ID  commNumber  IQR
0  10           3    3
1  11           2    1
2  12           1    0
3  13           4    3

Solution 2:[2]

The GroupBy object has a quantile method. You can calculate Q3 and Q1, and subtract them. Some renaming and joining follow it:

grouper = comms.groupby("ID")
q1, q3 = grouper.quantile(0.25), grouper.quantile(0.75)
iqr = q3 - q1
iqr = iqr.rename(columns={"commScore": "IQR"})

arts = arts.set_index("ID").join(iqr)

to get

>>> arts

    commNumber  IQR
ID
10           3  3.0
11           2  1.0
12           1  0.0
13           4  3.0

If there is more than one numeric column, then we go explicit about commScore as:

grouper = comms.groupby("ID").commScore
q1, q3 = grouper.quantile(0.25), grouper.quantile(0.75)
iqr = q3 - q1
iqr.name = "IQR"  # `iqr` will be a series since we selected 1 column,
                  #  so renaming is a bit different

arts = arts.set_index("ID").join(iqr)

which gives the same result.

If you don't want to call quantile 2 times, you can pass a list [0.75, 0.25] and then subtract them with agg. So instead of 2 lines above involving q1 and q3, we write

iqr = grouper.quantile([0.75, 0.25]).groupby("ID").agg(np.subtract.reduce)

and the rest is the same.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2