'Assign correct Qualification to all rows associated with Client per Month - Python / Pandas
I need to assign correct value (Qualified or Not Qualified) to all rows associated with Client per Month if condition for all of the associated rows is met.
test_data = {'Client Id': [1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland',
'Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland',
'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt',
'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt',],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022',
'01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/31/2022 - 02/06/2022', '02/07/2022 - 02/13/2022',
'02/14/2022 - 02/20/2022','02/21/2022 - 02/27/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022',
'01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/31/2022 - 02/06/2022', '02/07/2022 - 02/13/2022',
'02/14/2022 - 02/20/2022','02/21/2022 - 02/27/2022'],
'Month': ['January', 'January', 'January', 'January',
"February", "February", "February", "February",
'January', 'January', 'January', 'January',
"February", "February", "February", "February"],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,
2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Payment Status': ["Pending", "Paid in Full", "Didn't Paid", "Paid in Full",
"Paid in Full", "Paid in Full", "Paid in Full",
"Paid in Full", "Paid in Full", "Paid in Full",
"Paid in Full", "Paid in Full", "Paid in Full",
"Paid in Full", "Paid in Full", "Pending"]}
test_df = pd.DataFrame(data=test_data)
Data:
Client Id Client Name Week Month Year Payment Status
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didn't Paid
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full
1 Tom Holland 01/31/2022 - 02/06/2022 February 2022 Paid in Full
1 Tom Holland 02/07/2022 - 02/13/2022 February 2022 Paid in Full
1 Tom Holland 02/14/2022 - 02/20/2022 February 2022 Paid in Full
1 Tom Holland 02/21/2022 - 02/27/2022 February 2022 Paid in Full
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full
2 Brad Pitt 01/31/2022 - 02/06/2022 February 2022 Paid in Full
2 Brad Pitt 02/07/2022 - 02/13/2022 February 2022 Paid in Full
2 Brad Pitt 02/14/2022 - 02/20/2022 February 2022 Paid in Full
2 Brad Pitt 02/21/2022 - 02/27/2022 February 2022 Pending
If every row (Week) associated with Client per each Month is Paid in Full, then Qualified assigned to all rows (Weeks) associated with Client per each Month. Even if 1 Week is not Paid in Full (3 weeks can be Paid in Full, but 1 Didn't Paid or Pending), then all rows assigned to Not Qualified.
Desired output:
Client Id Client Name Week Month Year Payment Status Qualification
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending Not Qualified
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didn't Paid Not Qualified
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/31/2022 - 02/06/2022 February 2022 Paid in Full Qualified
1 Tom Holland 02/07/2022 - 02/13/2022 February 2022 Paid in Full Qualified
1 Tom Holland 02/14/2022 - 02/20/2022 February 2022 Paid in Full Qualified
1 Tom Holland 02/21/2022 - 02/27/2022 February 2022 Paid in Full Qualified
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/31/2022 - 02/06/2022 February 2022 Paid in Full Not Qualified
2 Brad Pitt 02/07/2022 - 02/13/2022 February 2022 Paid in Full Not Qualified
2 Brad Pitt 02/14/2022 - 02/20/2022 February 2022 Paid in Full Not Qualified
2 Brad Pitt 02/21/2022 - 02/27/2022 February 2022 Pending Not Qualified
I don't know how to achieve this, I though about value_counts in the loop:
for name, month in zip(list(test_df["Client Name"].unique()), list(test_df["Month"])):
print(test_df[(test_df["Client Name"] == name) & (test_df["Month"] == month)].value_counts(["Payment Status"]))
Solution 1:[1]
The key is to create a boolean mask: if Payment Status is "Paid in full" then True else False. Now you can group by Client Id, Month AND Year to check if all values are True. Use transform to broadcast the result to every row of the group. Finally, replace True/False by its respective values.
The boolean mask is created dynamically by adding a new column is_paid to the dataframe:
df['Qualification'] = (
df.assign(is_paid=df['Payment Status'] == 'Paid in Full')
.groupby(['Client Id', 'Month', 'Year'])['is_paid']
.transform('all').replace({True: 'Qualified', False: 'Not Qualified'})
)
print(df)
# Output
Client Id Client Name Week Month Year Payment Status Qualification
0 1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending Not Qualified
1 1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full Not Qualified
2 1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didn't Paid Not Qualified
3 1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full Not Qualified
4 1 Tom Holland 01/31/2022 - 02/06/2022 February 2022 Paid in Full Qualified
5 1 Tom Holland 02/07/2022 - 02/13/2022 February 2022 Paid in Full Qualified
6 1 Tom Holland 02/14/2022 - 02/20/2022 February 2022 Paid in Full Qualified
7 1 Tom Holland 02/21/2022 - 02/27/2022 February 2022 Paid in Full Qualified
8 2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full Qualified
9 2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full Qualified
10 2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full Qualified
11 2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full Qualified
12 2 Brad Pitt 01/31/2022 - 02/06/2022 February 2022 Paid in Full Not Qualified
13 2 Brad Pitt 02/07/2022 - 02/13/2022 February 2022 Paid in Full Not Qualified
14 2 Brad Pitt 02/14/2022 - 02/20/2022 February 2022 Paid in Full Not Qualified
15 2 Brad Pitt 02/21/2022 - 02/27/2022 February 2022 Pending Not Qualified
Solution 2:[2]
First convert Payment Status to bools:
test_df['Paid'] = test_df['Payment Status'] == 'Paid in Full'
>>> test_df
Client Id Client Name Week Month Year Payment Status Paid
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending FALSE
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full TRUE
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didnt Paid FALSE
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full TRUE
1 Tom Holland 01/31/2022 - 02/06/2022 February2022 Paid in Full TRUE
1 Tom Holland 02/07/2022 - 02/13/2022 February2022 Paid in Full TRUE
1 Tom Holland 02/14/2022 - 02/20/2022 February2022 Paid in Full TRUE
1 Tom Holland 02/21/2022 - 02/27/2022 February2022 Paid in Full TRUE
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/31/2022 - 02/06/2022 February2022 Paid in Full TRUE
2 Brad Pitt 02/07/2022 - 02/13/2022 February2022 Paid in Full TRUE
2 Brad Pitt 02/14/2022 - 02/20/2022 February2022 Paid in Full TRUE
2 Brad Pitt 02/21/2022 - 02/27/2022 February2022 Pending FALSE
The group them by Month and Client Id and check all Paid values in a group is True:
status = test_df[["Client Id", "Month", "Paid"]].groupby(["Month", "Client Id"]).all()
>>> status
Paid
Month Client Id
February 1 True
2 False
January 1 False
2 True
Now reset index and convert Paid back to text (Qualified or Not Qualified):
status = status.reset_index()
status['Paid'] = status['Paid'].map({True: 'Qualified', False:"Not Qualified"})
>>> status
Month Client Id Paid
February 1 Qualified
February 2 Not Qualified
January 1 Not Qualified
January 2 Qualified
Now merge with original table to get the desired results (and drop unnecessary columns made by merging. Also rename the new column:
output = pd.merge(test_df, a, on=['Client Id', 'Month']).drop(columns='Paid_x')
output = output.rename(columns={'Paid_y': 'Qualification'})
>>> output
Client Id Client Name Week Month Year Payment Status Qualification
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending Not Qualified
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didnt Paid Not Qualified
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/31/2022 - 02/06/2022 February2022 Paid in Full Qualified
1 Tom Holland 02/07/2022 - 02/13/2022 February2022 Paid in Full Qualified
1 Tom Holland 02/14/2022 - 02/20/2022 February2022 Paid in Full Qualified
1 Tom Holland 02/21/2022 - 02/27/2022 February2022 Paid in Full Qualified
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/31/2022 - 02/06/2022 February2022 Paid in Full Not Qualified
2 Brad Pitt 02/07/2022 - 02/13/2022 February2022 Paid in Full Not Qualified
2 Brad Pitt 02/14/2022 - 02/20/2022 February2022 Paid in Full Not Qualified
2 Brad Pitt 02/21/2022 - 02/27/2022 February2022 Pending Not Qualified
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
