'How to make conditional count for Dataframe Columns with groupby

I need some expert advice on what's the best way to perform groupby or any other operation to combine duplicate row based on its value in multiple columns. Let me explain:

DataFrame is as follow:

import pandas as pd
from pandas import Timestamp

df_dict=[{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Open', 'Total': 2, 'First Open Date': Timestamp('2022-03-04 00:00:00'), 'Last Open Date': Timestamp('2022-03-07 00:00:00'), 'Item Number': '1001, 1002'}, 
{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Closed', 'Total': 1, 'First Open Date': Timestamp('2022-03-05 00:00:00'), 'Last Open Date': Timestamp('2022-03-05 00:00:00'), 'Item Number': 1003}, 
{'Author': 'Author 1', 'Item Type': 'Bug', 'Status': 'Open', 'Total': 5, 'First Open Date': Timestamp('2022-03-06 00:00:00'), 'Last Open Date': Timestamp('2022-03-11 00:00:00'), 'Item Number': '1004, 1005, 1006, 1007, 1008'}, 
{'Author': 'Author1', 'Item Type': 'Request', 'Status': 'Rejected', 'Total': 2, 'First Open Date': Timestamp('2022-03-07 00:00:00'), 'Last Open Date': Timestamp('2022-03-11 00:00:00'), 'Item Number': '1009, 1010'}, 
{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Rejected', 'Total': 1, 'First Open Date': Timestamp('2022-03-06 00:00:00'), 'Last Open Date': Timestamp('2022-03-06 00:00:00'), 'Item Number': 1011}, 
{'Author': 'Author 1', 'Item Type': 'Bug', 'Status': 'Closed', 'Total': 3, 'First Open Date': Timestamp('2022-03-03 00:00:00'), 'Last Open Date': Timestamp('2022-03-10 00:00:00'), 'Item Number': '1013, 1014, 1015'}]

df = pd.DataFrame(df_dict)
df

enter image description here

I would like to get following dataframe after performing grouby or other functions:

output_dict=[{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Open, Close', 'Total': 5, 'First Open Date': Timestamp('2022-03-04 00:00:00'), 'Last Open Date': Timestamp('2022-03-07 00:00:00'), 'Item Number': '1001, 1002, 1009, 1010, 1011', 'Rejected Count': 2, 'Open Count': 2, 'Close Count': 1}, 
    {'Author': 'Author1', 'Item Type': 'Bug', 'Status': 'Open, Closed', 'Total': 10, 'First Open Date': Timestamp('2022-03-03 00:00:00'), 'Last Open Date': Timestamp('2022-03-11 00:00:00'), 'Item Number': '1004, 1005, 1006, 1007, 1008, 1013, 1014, 1015', 'Rejected Count': 2, 'Open Count': 5, 'Close Count': 3}]

df_out = pd.DataFrame(output_dict)
df_out

enter image description here

What needs to be done:

  • Generate result for Item Type for Feature and Bug. Request Item Type will be ignored but will be count towards Rejected Count for each Feature and Bug type.
  • Combined all Item Type's Item Numbers along with Requested Item Type in Item Number to ensure all Count remains same. (i.e Total Count = 5 so Item Number should have 5 items).
  • Combine all Item Type's First and Last Open Date based on combined result.

Please let me know what's the best way I can achieve this results.

I have tried multiple groupby but unable to achieve the result.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source