'How to create column marking latest date from grouped data

I am working with a data frame that looks like the following:

ID 1	Type	Name	Date	Marked
1	abs	576	2009	Last
1	sqr	573	2020
1	fun	234	2020
1	no	556	2020
2	abs	364	2015
2	sqr	937	2019
2	fun	562	2020	Last
2	ok	794	2021
3	abs	846	2012	Last
3	fun	540	2015
3	sqr	276	2016
3	no	649	2017
4	abs	845	1999
4	fun	047	2012
4	sqr	176	2020	Last
4	sqr	959	2020

I am trying to write a program to correct and update the dataframe with a new Updated_Marked column that will correctly display "Last" given the following conditions:

Within each ID 1 group, only look at rows with Type == "abs" | Type == "sqr" | Type == fun. In each group, look at the latest Date. If "Last" is already marked next to latest date, do nothing. If "Last" is not correct, update new column placing "Last" in latest row. And if "Last" is not correct and multiple rows are tied for latest (same date values), then put "multiple" next to each.

EDIT: I have come across one more case - where a row is originally correctly marked as "Last" but because other rows within the group have the same year, it marks the other years as "multiple". In this case, I would like to assume the row is originally correct and thus make no updates. Changes have been reflected in initial dataframe and output.

The updated dataframe will look like the following:

+------+------+------+------+--------+----------------+
| ID 1 | Type | Name | Date | Marked | Updated_Marked |
+------+------+------+------+--------+----------------+
|    1 | abs  |  576 | 2009 | Last   |                |
|    1 | sqr  |  573 | 2020 |        | Multiple       |
|    1 | fun  |  234 | 2020 |        | Multiple       |
|    1 | no   |  556 | 2020 |        |                |
|    2 | abs  |  364 | 2015 |        |                |
|    2 | sqr  |  937 | 2019 |        |                |
|    2 | fun  |  562 | 2020 | Last   |                |
|    2 | ok   |  794 | 2021 |        |                |
|    3 | abs  |  846 | 2012 | Last   |                |
|    3 | fun  |  540 | 2015 |        |                |
|    3 | sqr  |  276 | 2016 |        | Last           |
|    3 | no   |  649 | 2017 |        |                |
|    4 | abs  |  845 | 1999 |        |                |
|    4 | fun  |  047 | 2012 |        |                |
|    4 | sqr  |  176 | 2020 |  Last  |                |
|    4 | sqr  |  959 | 2020 |        |                |
+------+------+------+------+--------+----------------+

EDIT2: after trying

msk1 = df['Type'].isin(['abs','sqr','fun'])
msk2 = df.groupby(['ID 1', msk1])['Date'].transform('max').eq(df['Date'])
msk3 = df['Marked'].ne('Last').groupby([df['ID 1'], msk1 & msk2]).cummin()
msks = msk1 & msk2 & msk3
v = df.loc[msks, 'ID 1']
df.loc[msks, 'Updated_Marked'] = v.map(v.value_counts().gt(1).map({True: 'Multiple', False: 'Last'}))
df = df.fillna('')

my dataframe looks like the following:

+------+------+------+------+--------+----------------+
| ID 1 | Type | Name | Date | Marked | Updated_Marked |
+------+------+------+------+--------+----------------+
|    1 | abs  |  576 | 2009 | Last   |                |
|    1 | sqr  |  573 | 2020 |        | Multiple       |
|    1 | fun  |  234 | 2020 |        | Multiple       |
|    1 | no   |  556 | 2020 |        |                |
|    2 | abs  |  364 | 2015 |        |                |
|    2 | sqr  |  937 | 2019 |        |                |
|    2 | fun  |  562 | 2020 | Last   |     Last       |
|    2 | ok   |  794 | 2021 |        |                |
|    3 | abs  |  846 | 2012 | Last   |                |
|    3 | fun  |  540 | 2015 |        |                |
|    3 | sqr  |  276 | 2016 |        | Last           |
|    3 | no   |  649 | 2017 |        |                |
|    4 | abs  |  845 | 1999 |        |                |
|    4 | fun  |  047 | 2012 |        |                |
|    4 | sqr  |  176 | 2020 |  Last  |     Multiple   |
|    4 | sqr  |  959 | 2020 |        |     Multiple   |
+------+------+------+------+--------+----------------+

Essentially, if a row is originally marked "Last" and it contains the latest date (even if there are others in the same date), I would like to leave the Updated_Marked column blank.

Solution 1:^[1]

Here's one way using 3 masks. The idea is to identify the rows that must be updated, using its Type, latest Date, and if it's marked "Last". Then for those rows, assign values in Updated_Marked column rows depending on the number of ID 1s it shares with other rows

msk1 = df['Type'].isin(['abs','sqr','fun'])
msk2 = df.groupby(['ID 1', msk1])['Date'].transform('max').eq(df['Date'])
msk3 = df['Marked'].ne('Last').groupby([df['ID 1'], msk1 & msk2]).cummin()
msks = msk1 & msk2 & msk3
v = df.loc[msks, 'ID 1']
df.loc[msks, 'Updated_Marked'] = v.map(v.value_counts().gt(1).map({True: 'Multiple', False: 'Last'}))
df = df.fillna('')

Output:

    ID 1 Type  Name  Date Marked Updated_Marked
0      1  abs   576  2009   Last               
1      1  sqr   573  2020              Multiple
2      1  fun   234  2020              Multiple
3      1   no   556  2020                      
4      2  abs   364  2015                      
5      2  sqr   937  2019                      
6      2  fun   562  2020   Last               
7      2   ok   794  2021                      
8      3  abs   846  2012   Last               
9      3  fun   540  2015                      
10     3  sqr   276  2016                  Last
11     3   no   649  2017                      
12     4  abs   845  1999                      
13     4  fun    47  2012                      
14     4  sqr   176  2020   Last               
15     4  sqr   959  2020

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How to create column marking latest date from grouped data

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]