'How to partition/slice rows horizontally in a data frame by contiguous occurrence of same value in column(s) to generate a statistical info in python?
Please find attached snap and provide me how to reach to a solution of desired output mentioned in image description?
Code to generate input dataframe:
df = pd.DataFrame({'timestamp':pd.date_range('2022-04-30 00:00:00', periods=19, freq='S'),
'fault_code':['A']*4+['B']*4+['A']*2+['C']*5+['B']*2+['A']*2})
Solution 1:[1]
You can try something like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'timestamp':pd.date_range('2022-04-30 00:00:00', periods=19, freq='S'),
'fault_code':['A']*4+['B']*4+['A']*2+['C']*5+['B']*2+['A']*2})
df['group'] = (df['fault_code'] != df['fault_code'].shift()).cumsum()
df_s = df.groupby(['fault_code','group'], as_index=False)['timestamp']\
.agg(lambda x: int(np.ptp(x).total_seconds()))
df_out = df_s.groupby('fault_code').agg(occurrence=('fault_code','count'),
duration=('timestamp', list),
total_duration=('timestamp','sum'))
df_out
Output:
occurrence duration total_duration
fault_code
A 3 [3, 1, 1] 5
B 2 [3, 1] 4
C 1 [4] 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Scott Boston |

