'Pandas: How to find the index of a cell from groupby values?
I have a dataframe:
>>> import pandas as pd
>>> dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
>>> humidity = [11, 22, 33, 44, 55, 66]
>>> hours = [0,16,24,0,16,24]
>>> df = pd.DataFrame(list(zip(dates, hours, humidity)),
... columns =['dates', 'hours', 'humidity'])
>>> df
dates hours humidity
0 1/1/2020 0 11
1 1/1/2020 16 22
2 1/1/2020 24 33
3 1/2/2020 0 44
4 1/2/2020 16 55
5 1/2/2020 24 66
I want to create two new columns. One column will store the index when hours == 16. Another column will store the index when humidity is maximum. Both these operations need to be done separately for each date. I can find the maximum humidity for each date using the groupby and transform functions as below:
>>> df["max_humidity"] = ""
>>> df["max_humidity"] = df["humidity"].groupby(df["dates"]).transform("max")
>>> df
dates hours humidity max_humidity
0 1/1/2020 0 11 33
1 1/1/2020 16 22 33
2 1/1/2020 24 33 33
3 1/2/2020 0 44 66
4 1/2/2020 16 55 66
5 1/2/2020 24 66 66
However, I couldn't figure out how to find the index position when the humidity is maximum for each date. The final result should look like this:
>>> df
dates hours humidity max_humidity sixteen_hr_idx max_humidity_idx
0 1/1/2020 0 11 33 1 2
1 1/1/2020 16 22 33 1 2
2 1/1/2020 24 33 33 1 2
3 1/2/2020 0 44 66 4 5
4 1/2/2020 16 55 66 4 5
5 1/2/2020 24 66 66 4 5
Solution 1:[1]
I was able to accomplish this by creating two dataframes with the index values you need and then merging them on to the parent dataframe.
import pandas as pd
dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
humidity = [11, 22, 33, 44, 55, 66]
hours = [0,16,24,0,16,24]
df = pd.DataFrame(list(zip(dates, hours, humidity)),
columns =['dates', 'hours', 'humidity'])
hum_df = df["humidity"].groupby(df["dates"]).transform("max")
df['max_humidity'] = hum_df
sixteen_df = df[df['hours'] == 16]
sixteen_df.index.name = 'sixteen_hour_index'
sixteen_df = sixteen_df.reset_index()
sixteen_df
Out[2]:
sixteen_hour_index dates hours humidity max_humidity
0 1 1/1/2020 16 22 33
1 4 1/2/2020 16 55 66
hum_index_df = df[df['humidity'] == df['max_humidity']]
hum_index_df.index.name = 'humidity_max_index'
hum_index_df = hum_index_df.reset_index()
hum_index_df
Out[3]:
humidity_max_index dates hours humidity max_humidity
0 2 1/1/2020 24 33 33
1 5 1/2/2020 24 66 66
Merge allows us to broadcast all of the index values (2 and 5) on to the dates where there is a match.
df = df.merge(hum_index_df[['humidity_max_index', 'dates']], on=('dates'))
df = df.merge(sixteen_df[['sixteen_hour_index', 'dates']], on=('dates'))
df
Out[4]:
dates hours humidity max_humidity humidity_max_index sixteen_hour_index
0 1/1/2020 0 11 33 2 1
1 1/1/2020 16 22 33 2 1
2 1/1/2020 24 33 33 2 1
3 1/2/2020 0 44 66 5 4
4 1/2/2020 16 55 66 5 4
5 1/2/2020 24 66 66 5 4
Solution 2:[2]
Update, sophocles solution for sixteen_hr_idx is way shorter and better, I partially adapted it:
df = df.merge(df.loc[df.hours.eq(16)].reset_index()[["dates", "index"]], on="dates")
df = df.merge(df.groupby(df["dates"])["hours"].idxmax(), on="dates").rename(columns={
"index":"sixteen_hr_idx",
"hours_y":"max_humidity_idx"
})
Output:
dates hours humidity max_humidity sixteen_hr_idx max_humidity_idx
0 1/1/2020 0 11 33 1 2
1 1/1/2020 16 22 33 1 2
2 1/1/2020 24 33 33 1 2
3 1/2/2020 0 44 66 4 5
4 1/2/2020 16 55 66 4 5
5 1/2/2020 24 66 66 4 5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Correy Koshnick |
| Solution 2 |
