'How to partition/slice rows horizontally in a data frame by contiguous occurrence of same value in column(s) to generate a statistical info in python?

Please find attached snap and provide me how to reach to a solution of desired output mentioned in image description?

enter image description here

Code to generate input dataframe:

df = pd.DataFrame({'timestamp':pd.date_range('2022-04-30 00:00:00', periods=19, freq='S'),
                  'fault_code':['A']*4+['B']*4+['A']*2+['C']*5+['B']*2+['A']*2})


Solution 1:[1]

You can try something like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({'timestamp':pd.date_range('2022-04-30 00:00:00', periods=19, freq='S'),
                  'fault_code':['A']*4+['B']*4+['A']*2+['C']*5+['B']*2+['A']*2})

df['group'] = (df['fault_code'] != df['fault_code'].shift()).cumsum()

df_s = df.groupby(['fault_code','group'], as_index=False)['timestamp']\
         .agg(lambda x: int(np.ptp(x).total_seconds()))

df_out = df_s.groupby('fault_code').agg(occurrence=('fault_code','count'),
                               duration=('timestamp', list),
                               total_duration=('timestamp','sum'))

df_out

Output:

            occurrence   duration  total_duration
fault_code                                       
A                    3  [3, 1, 1]               5
B                    2     [3, 1]               4
C                    1        [4]               4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Scott Boston