'Pandas: groupby with multiple columns and applying new value on key values in grouping
I'm working with pandas groupby function and I'm having some trouble figuring out how to accomplish what I'm doing. I'm working with a dataset that looks like this:
Turb Time (UTC) Day Lat/Lon
52 27.0 2016-11-07 13:45:00+00:00 2016-11-07 (37.571398, 76.88399)
53 27.0 2016-11-07 14:00:00+00:00 2016-11-07 (37.571398, 76.88399)
54 42.0 2016-11-07 14:15:00+00:00 2016-11-07 (37.571398, 76.88399)
55 58.0 2016-11-07 14:30:00+00:00 2016-11-07 (37.571398, 76.88399)
56 51.0 2016-11-07 14:45:00+00:00 2016-11-07 (37.571398, 76.88399)
... ... ... ... ...
20284 53.0 2021-07-09 17:15:00+00:00 2021-07-09 (37.414986, 76.71442)
20285 45.0 2021-07-09 17:30:00+00:00 2021-07-09 (37.414986, 76.71442)
20286 43.0 2021-07-09 17:45:00+00:00 2021-07-09 (37.414986, 76.71442)
20287 57.0 2021-07-09 18:00:00+00:00 2021-07-09 (37.414986, 76.71442)
20288 45.0 2021-07-09 18:15:00+00:00 2021-07-09 (37.414986, 76.71442)
I would really like to group the dataset by both the Lat/Lon value, as well as the day, so that it will look like this:
Turb Time (UTC)
Lat/Lon Day
(37.215796, 76.392675) 2015-07-26 2.0 2015-07-26 13:45:00+00:00
2015-09-14 5.0 2015-09-14 13:45:00+00:00
2015-11-13 1.0 2015-11-13 13:45:00+00:00
2015-11-23 3.0 2015-11-23 13:45:00+00:00
2016-01-02 1.0 2016-01-02 13:30:00+00:00
... ... ...
(37.571398, 76.88399) 2020-12-26 38.0 2020-12-26 13:30:00+00:00
2021-03-11 18.0 2021-03-11 13:30:00+00:00
2021-04-05 17.0 2021-04-05 13:30:00+00:00
2021-05-15 24.0 2021-05-15 13:30:00+00:00
2021-07-09 23.0 2021-07-09 13:30:00+00:00
I've figured out how to do that with the following line of code:
gb = df.groupby(['Lat/Lon', 'Day'])
However, I'm not really sure how to go about with the next part. For lat/lon and day combination, I want to add a value to a column that represents a value that is retrieved from another dataset representing that lat/lon and day combination. I can't join it with this other data set because I'm working with a geospatial xarray. How can I go about adding a value with my own defined function so that the data set will look like this in the end?
Turb Time (UTC) Satellite
Lat/Lon Day
(37.215796, 76.392675) 2015-07-26 2.0 2015-07-26 13:45:00+00:00 A
2015-09-14 5.0 2015-09-14 13:45:00+00:00 B
2015-11-13 1.0 2015-11-13 13:45:00+00:00 C
2015-11-23 3.0 2015-11-23 13:45:00+00:00 D
2016-01-02 1.0 2016-01-02 13:30:00+00:00 E
... ... ...
(37.571398, 76.88399) 2020-12-26 38.0 2020-12-26 13:30:00+00:00 F
2021-03-11 18.0 2021-03-11 13:30:00+00:00 G
2021-04-05 17.0 2021-04-05 13:30:00+00:00 H
2021-05-15 24.0 2021-05-15 13:30:00+00:00 I
2021-07-09 23.0 2021-07-09 13:30:00+00:00 J
Futhermore, how can I go about ungrouping the dataframe so that it looks like this at the end?
Turb Time (UTC) Day Lat/Lon Satellite
52 27.0 2016-11-07 13:45:00+00:00 2016-11-07 (37.571398, 76.88399) A
53 27.0 2016-11-07 14:00:00+00:00 2016-11-07 (37.571398, 76.88399) A
54 42.0 2016-11-07 14:15:00+00:00 2016-11-07 (37.571398, 76.88399) A
55 58.0 2016-11-07 14:30:00+00:00 2016-11-07 (37.571398, 76.88399) A
56 51.0 2016-11-07 14:45:00+00:00 2016-11-07 (37.571398, 76.88399) A
... ... ... ... ...
20284 53.0 2021-07-09 17:15:00+00:00 2021-07-09 (37.414986, 76.71442) Z
20285 45.0 2021-07-09 17:30:00+00:00 2021-07-09 (37.414986, 76.71442) Z
20286 43.0 2021-07-09 17:45:00+00:00 2021-07-09 (37.414986, 76.71442) Z
20287 57.0 2021-07-09 18:00:00+00:00 2021-07-09 (37.414986, 76.71442) Z
20288 45.0 2021-07-09 18:15:00+00:00 2021-07-09 (37.414986, 76.71442) Z
Basically, I want to apply the same value to locations on the same day and location, regardless of time. Would it be better to just filter by unique pairs of lat/lon and iterate through all the dates to apply the value to a new column?
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
