'How can I grab rows with max date from Pandas dataframe?
I have a Pandas dataframe that looks like this:
and I want to grab for each distinct ID, the row with the max date so that my final results looks something like this:
My date column is of data type 'object'. I have tried grouping and then trying to grab the max like the following:
idx = df.groupby(['ID','Item'])['date'].transform(max) == df_Trans['date']
df_new = df[idx]
However I am unable to get the desired result.
Solution 1:[1]
The last bit of code from piRSquared's answer is wrong.
We are trying to get distinct IDs, so the column used in drop_duplicates should be 'ID'. keep='last' would then retrieve the last (and max) date for each ID.
df.sort_values(['ID', 'date']).drop_duplicates('ID', keep='last')
Solution 2:[2]
My answer is a generalization of piRSquared's answer:
manykeyindicates the keys from which the mapping is desired (many-to)onekeyindicates the keys to which the mapping is desired (-to-one)sortkeyis sortable key and it followsascset to True (as python standard)def get_last(df:pd.DataFrame,manykey:list[str],onekey:list[str],sortkey,asc=True): return df.sort_values(sortkey,asc).drop_duplicates(subset=manykey, keep='last')[manykey+onekey]
In your case the answer should be:
get_last(df,["id"],["item"],"date")
Note that I am using the onekey explicitly because I want to drop the rest of the keys (if they are in the table) and create a mapping.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nimantha |
| Solution 2 | Nimantha |


