'How to make conditional count for Dataframe Columns with groupby
I need some expert advice on what's the best way to perform groupby or any other operation to combine duplicate row based on its value in multiple columns. Let me explain:
DataFrame is as follow:
import pandas as pd
from pandas import Timestamp
df_dict=[{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Open', 'Total': 2, 'First Open Date': Timestamp('2022-03-04 00:00:00'), 'Last Open Date': Timestamp('2022-03-07 00:00:00'), 'Item Number': '1001, 1002'},
{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Closed', 'Total': 1, 'First Open Date': Timestamp('2022-03-05 00:00:00'), 'Last Open Date': Timestamp('2022-03-05 00:00:00'), 'Item Number': 1003},
{'Author': 'Author 1', 'Item Type': 'Bug', 'Status': 'Open', 'Total': 5, 'First Open Date': Timestamp('2022-03-06 00:00:00'), 'Last Open Date': Timestamp('2022-03-11 00:00:00'), 'Item Number': '1004, 1005, 1006, 1007, 1008'},
{'Author': 'Author1', 'Item Type': 'Request', 'Status': 'Rejected', 'Total': 2, 'First Open Date': Timestamp('2022-03-07 00:00:00'), 'Last Open Date': Timestamp('2022-03-11 00:00:00'), 'Item Number': '1009, 1010'},
{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Rejected', 'Total': 1, 'First Open Date': Timestamp('2022-03-06 00:00:00'), 'Last Open Date': Timestamp('2022-03-06 00:00:00'), 'Item Number': 1011},
{'Author': 'Author 1', 'Item Type': 'Bug', 'Status': 'Closed', 'Total': 3, 'First Open Date': Timestamp('2022-03-03 00:00:00'), 'Last Open Date': Timestamp('2022-03-10 00:00:00'), 'Item Number': '1013, 1014, 1015'}]
df = pd.DataFrame(df_dict)
df
I would like to get following dataframe after performing grouby or other functions:
output_dict=[{'Author': 'Author1', 'Item Type': 'Feature', 'Status': 'Open, Close', 'Total': 5, 'First Open Date': Timestamp('2022-03-04 00:00:00'), 'Last Open Date': Timestamp('2022-03-07 00:00:00'), 'Item Number': '1001, 1002, 1009, 1010, 1011', 'Rejected Count': 2, 'Open Count': 2, 'Close Count': 1},
{'Author': 'Author1', 'Item Type': 'Bug', 'Status': 'Open, Closed', 'Total': 10, 'First Open Date': Timestamp('2022-03-03 00:00:00'), 'Last Open Date': Timestamp('2022-03-11 00:00:00'), 'Item Number': '1004, 1005, 1006, 1007, 1008, 1013, 1014, 1015', 'Rejected Count': 2, 'Open Count': 5, 'Close Count': 3}]
df_out = pd.DataFrame(output_dict)
df_out
What needs to be done:
- Generate result for Item Type for Feature and Bug. Request Item Type will be ignored but will be count towards Rejected Count for each Feature and Bug type.
- Combined all Item Type's Item Numbers along with Requested Item Type in Item Number to ensure all Count remains same. (i.e Total Count = 5 so Item Number should have 5 items).
- Combine all Item Type's First and Last Open Date based on combined result.
Please let me know what's the best way I can achieve this results.
I have tried multiple groupby but unable to achieve the result.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|


