'How to create column marking latest date from grouped data

I am working with a data frame that looks like the following:

ID 1 Type Name Date Marked Updated_Marked
1 abs 576 2009 Last
1 sqr 573 2020
1 fun 234 2020
1 no 556 2020
2 abs 364 2015
2 sqr 937 2019
2 fun 562 2020 Last
2 ok 794 2021
3 abs 846 2012 Last
3 fun 540 2015
3 sqr 276 2016
3 no 649 2017
4 abs 845 1999
4 fun 047 2012
4 sqr 176 2020 Last
4 sqr 959 2020

I am trying to write a program to correct and update the dataframe with a new Updated_Marked column that will correctly display "Last" given the following conditions:

Within each ID 1 group, only look at rows with Type == "abs" | Type == "sqr" | Type == fun. In each group, look at the latest Date. If "Last" is already marked next to latest date, do nothing. If "Last" is not correct, update new column placing "Last" in latest row. And if "Last" is not correct and multiple rows are tied for latest (same date values), then put "multiple" next to each.

EDIT: I have come across one more case - where a row is originally correctly marked as "Last" but because other rows within the group have the same year, it marks the other years as "multiple". In this case, I would like to assume the row is originally correct and thus make no updates. Changes have been reflected in initial dataframe and output.

The updated dataframe will look like the following:

+------+------+------+------+--------+----------------+
| ID 1 | Type | Name | Date | Marked | Updated_Marked |
+------+------+------+------+--------+----------------+
|    1 | abs  |  576 | 2009 | Last   |                |
|    1 | sqr  |  573 | 2020 |        | Multiple       |
|    1 | fun  |  234 | 2020 |        | Multiple       |
|    1 | no   |  556 | 2020 |        |                |
|    2 | abs  |  364 | 2015 |        |                |
|    2 | sqr  |  937 | 2019 |        |                |
|    2 | fun  |  562 | 2020 | Last   |                |
|    2 | ok   |  794 | 2021 |        |                |
|    3 | abs  |  846 | 2012 | Last   |                |
|    3 | fun  |  540 | 2015 |        |                |
|    3 | sqr  |  276 | 2016 |        | Last           |
|    3 | no   |  649 | 2017 |        |                |
|    4 | abs  |  845 | 1999 |        |                |
|    4 | fun  |  047 | 2012 |        |                |
|    4 | sqr  |  176 | 2020 |  Last  |                |
|    4 | sqr  |  959 | 2020 |        |                |
+------+------+------+------+--------+----------------+

EDIT2: after trying

msk1 = df['Type'].isin(['abs','sqr','fun'])
msk2 = df.groupby(['ID 1', msk1])['Date'].transform('max').eq(df['Date'])
msk3 = df['Marked'].ne('Last').groupby([df['ID 1'], msk1 & msk2]).cummin()
msks = msk1 & msk2 & msk3
v = df.loc[msks, 'ID 1']
df.loc[msks, 'Updated_Marked'] = v.map(v.value_counts().gt(1).map({True: 'Multiple', False: 'Last'}))
df = df.fillna('')

my dataframe looks like the following:

+------+------+------+------+--------+----------------+
| ID 1 | Type | Name | Date | Marked | Updated_Marked |
+------+------+------+------+--------+----------------+
|    1 | abs  |  576 | 2009 | Last   |                |
|    1 | sqr  |  573 | 2020 |        | Multiple       |
|    1 | fun  |  234 | 2020 |        | Multiple       |
|    1 | no   |  556 | 2020 |        |                |
|    2 | abs  |  364 | 2015 |        |                |
|    2 | sqr  |  937 | 2019 |        |                |
|    2 | fun  |  562 | 2020 | Last   |     Last       |
|    2 | ok   |  794 | 2021 |        |                |
|    3 | abs  |  846 | 2012 | Last   |                |
|    3 | fun  |  540 | 2015 |        |                |
|    3 | sqr  |  276 | 2016 |        | Last           |
|    3 | no   |  649 | 2017 |        |                |
|    4 | abs  |  845 | 1999 |        |                |
|    4 | fun  |  047 | 2012 |        |                |
|    4 | sqr  |  176 | 2020 |  Last  |     Multiple   |
|    4 | sqr  |  959 | 2020 |        |     Multiple   |
+------+------+------+------+--------+----------------+

Essentially, if a row is originally marked "Last" and it contains the latest date (even if there are others in the same date), I would like to leave the Updated_Marked column blank.



Solution 1:[1]

Here's one way using 3 masks. The idea is to identify the rows that must be updated, using its Type, latest Date, and if it's marked "Last". Then for those rows, assign values in Updated_Marked column rows depending on the number of ID 1s it shares with other rows

msk1 = df['Type'].isin(['abs','sqr','fun'])
msk2 = df.groupby(['ID 1', msk1])['Date'].transform('max').eq(df['Date'])
msk3 = df['Marked'].ne('Last').groupby([df['ID 1'], msk1 & msk2]).cummin()
msks = msk1 & msk2 & msk3
v = df.loc[msks, 'ID 1']
df.loc[msks, 'Updated_Marked'] = v.map(v.value_counts().gt(1).map({True: 'Multiple', False: 'Last'}))
df = df.fillna('')

Output:

    ID 1 Type  Name  Date Marked Updated_Marked
0      1  abs   576  2009   Last               
1      1  sqr   573  2020              Multiple
2      1  fun   234  2020              Multiple
3      1   no   556  2020                      
4      2  abs   364  2015                      
5      2  sqr   937  2019                      
6      2  fun   562  2020   Last               
7      2   ok   794  2021                      
8      3  abs   846  2012   Last               
9      3  fun   540  2015                      
10     3  sqr   276  2016                  Last
11     3   no   649  2017                      
12     4  abs   845  1999                      
13     4  fun    47  2012                      
14     4  sqr   176  2020   Last               
15     4  sqr   959  2020                      

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1