'Pandas: How to find the index of a cell from groupby values?

I have a dataframe:

>>> import pandas as pd
>>> dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
>>> humidity = [11, 22, 33, 44, 55, 66]
>>> hours = [0,16,24,0,16,24]
>>> df = pd.DataFrame(list(zip(dates, hours, humidity)),
...                columns =['dates', 'hours', 'humidity'])
>>> df
      dates  hours  humidity
0  1/1/2020      0        11
1  1/1/2020     16        22
2  1/1/2020     24        33
3  1/2/2020      0        44
4  1/2/2020     16        55
5  1/2/2020     24        66

I want to create two new columns. One column will store the index when hours == 16. Another column will store the index when humidity is maximum. Both these operations need to be done separately for each date. I can find the maximum humidity for each date using the groupby and transform functions as below:

>>> df["max_humidity"] = ""
>>> df["max_humidity"] = df["humidity"].groupby(df["dates"]).transform("max")
>>> df
      dates  hours  humidity  max_humidity
0  1/1/2020      0        11            33
1  1/1/2020     16        22            33
2  1/1/2020     24        33            33
3  1/2/2020      0        44            66
4  1/2/2020     16        55            66
5  1/2/2020     24        66            66

However, I couldn't figure out how to find the index position when the humidity is maximum for each date. The final result should look like this:

>>> df
      dates  hours  humidity  max_humidity  sixteen_hr_idx  max_humidity_idx
0  1/1/2020      0        11            33               1                 2
1  1/1/2020     16        22            33               1                 2
2  1/1/2020     24        33            33               1                 2
3  1/2/2020      0        44            66               4                 5
4  1/2/2020     16        55            66               4                 5
5  1/2/2020     24        66            66               4                 5

Solution 1:^[1]

I was able to accomplish this by creating two dataframes with the index values you need and then merging them on to the parent dataframe.

import pandas as pd
dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
humidity = [11, 22, 33, 44, 55, 66]
hours = [0,16,24,0,16,24]
df = pd.DataFrame(list(zip(dates, hours, humidity)),
              columns =['dates', 'hours', 'humidity'])


hum_df = df["humidity"].groupby(df["dates"]).transform("max")
df['max_humidity'] = hum_df

sixteen_df = df[df['hours'] == 16]
sixteen_df.index.name = 'sixteen_hour_index'
sixteen_df = sixteen_df.reset_index()

sixteen_df
Out[2]: 
   sixteen_hour_index     dates  hours  humidity  max_humidity
0                   1  1/1/2020     16        22            33
1                   4  1/2/2020     16        55            66

hum_index_df = df[df['humidity'] == df['max_humidity']]
hum_index_df.index.name = 'humidity_max_index'
hum_index_df = hum_index_df.reset_index()

hum_index_df
Out[3]: 
   humidity_max_index     dates  hours  humidity  max_humidity
0                   2  1/1/2020     24        33            33
1                   5  1/2/2020     24        66            66

Merge allows us to broadcast all of the index values (2 and 5) on to the dates where there is a match.

df = df.merge(hum_index_df[['humidity_max_index', 'dates']], on=('dates'))
df = df.merge(sixteen_df[['sixteen_hour_index', 'dates']], on=('dates'))

df
Out[4]: 
      dates  hours  humidity  max_humidity  humidity_max_index  sixteen_hour_index
0  1/1/2020      0        11            33                   2                   1
1  1/1/2020     16        22            33                   2                   1
2  1/1/2020     24        33            33                   2                   1
3  1/2/2020      0        44            66                   5                   4
4  1/2/2020     16        55            66                   5                   4
5  1/2/2020     24        66            66                   5                   4

Solution 2:^[2]

Update, sophocles solution for sixteen_hr_idx is way shorter and better, I partially adapted it:

df = df.merge(df.loc[df.hours.eq(16)].reset_index()[["dates", "index"]], on="dates")

df = df.merge(df.groupby(df["dates"])["hours"].idxmax(), on="dates").rename(columns={
    "index":"sixteen_hr_idx",
    "hours_y":"max_humidity_idx"
})

Output:

      dates  hours  humidity  max_humidity  sixteen_hr_idx  max_humidity_idx
0  1/1/2020      0        11            33               1                 2
1  1/1/2020     16        22            33               1                 2
2  1/1/2020     24        33            33               1                 2
3  1/2/2020      0        44            66               4                 5
4  1/2/2020     16        55            66               4                 5
5  1/2/2020     24        66            66               4                 5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Correy Koshnick
Solution 2

'Pandas: How to find the index of a cell from groupby values?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]