'How to group by values in a column and find time difference using python?

I have a dataframe as shown below:

Col A	Time	Col B	Col C
123	2018-01-06 03:45:23	B	1
141	2018-01-08 12:45:55	C	0
123	2018-01-08 11:45:29	A	0
123	2018-01-08 01:45:15	C	0
141	2018-01-08 11:35:06	C	0
123	2018-01-09 10:45:37	A	1
123	2018-01-09 09:45:41	C	0
141	2018-01-06 03:45:28	B	0
123	2018-01-08 12:45:23	C	0
141	2018-01-08 11:45:17	A	0
141	2018-01-08 01:45:02	C	0
123	2018-01-08 11:35:56	C	0
141	2018-01-09 10:45:39	A	0
141	2018-01-09 09:45:19	C	0
178	2019-04-17 08:34:13	B	1
178	2019-05-02 19:48:44	C	0
178	2019-04-18 13:27:28	A	1

For this dataframe, there are two operations that need to be performed:

I have to group by values in Col A sort it by Time and calculate the lagging time difference
Filter the dataframe only for values B and A in Col B, and apply the above operation
Keeping the conditions in Step 2, Calculate time difference for each A in Col B from the very first timestamp value of that group by value in Col A
Keeping the filter conditions in Step 2, and where the value of Col C is 1, calculate time difference of B and A of Col B. In this case, only one B and one A for each group of Col A can have value 1 in Col C

For both the above steps, a new column should be created. In case 2, where the value in column Col B is C, in the newly created column it should be filled with NaN. Can someone please help with this implementation using Python?

Expected Output:

Col A	Time	Col B	Col C	Lag Time	Filtered Lag	Diff	New Col
123	2018-01-06 03:45:23	B	1	NaN	NaN	NaN	NaN
123	2018-01-08 01:45:15	C	0	165592	NaN	NaN	NaN
123	2018-01-08 11:35:56	C	0	35441	NaN	NaN	NaN
123	2018-01-08 11:45:29	A	0	573	201606	201606	NaN
123	2018-01-08 12:45:23	C	0	3594	NaN	NaN	NaN
123	2018-01-09 09:45:41	C	0	75618	NaN	NaN	NaN
123	2018-01-09 10:45:37	A	1	3596	82808	284414	284414
178	2019-04-17 08:34:13	B	1	NaN	NaN	NaN	NaN
178	2019-04-18 13:27:28	A	1	103995	103995	103995	103995
178	2019-05-02 19:48:44	C	0	1232476	NaN	NaN	NaN

Similarly for other values in Col A also it should calculate values for new columns.

Note: Values in new columns are calculated in seconds

Solution 1:^[1]

Would this work? The first column seems straightforward. For the second column, we use where to only work with rows with A or B in ColB.

df['Time'] = pd.to_datetime(df['Time'])
df['new'] = df.sort_values('Time').groupby('Col A')['Time'].diff().dt.total_seconds()
df['new2'] = df['Time'].where(df['Col B'].isin(['A','B'])).groupby(df['Col A']).apply(lambda x: x.sort_values().diff()).droplevel(0).dt.total_seconds()

Output:

    Col A                Time Col B        new      new2
0     123 2018-01-06 03:45:23     B        NaN       NaN
1     141 2018-01-08 12:45:55     C     3638.0       NaN
2     123 2018-01-08 11:45:29     A      573.0  201606.0
3     123 2018-01-08 01:45:15     C   165592.0       NaN
4     141 2018-01-08 11:35:06     C    35404.0       NaN
5     123 2018-01-09 10:45:37     A     3596.0   82808.0
6     123 2018-01-09 09:45:41     C    75618.0       NaN
7     141 2018-01-06 03:45:28     B        NaN       NaN
8     123 2018-01-08 12:45:23     C     3594.0       NaN
9     141 2018-01-08 11:45:17     A      611.0  201589.0
10    141 2018-01-08 01:45:02     C   165574.0       NaN
11    123 2018-01-08 11:35:56     C    35441.0       NaN
12    141 2018-01-09 10:45:39     A     3620.0   82822.0
13    141 2018-01-09 09:45:19     C    75564.0       NaN
14    178 2019-04-17 08:34:13     B        NaN       NaN
15    178 2019-05-02 19:48:44     C  1232476.0       NaN
16    178 2019-04-18 13:27:28     A   103995.0  103995.0

Solution 2:^[2]

Below is the solution to the above question meeting all the conditions:

# Converting `Time` column to `datetime`
df['Time'] = pd.to_datetime(df['Time'])

# Group by `Col A` after sorting by `Time` and calculate row difference in seconds for each group
df['Lag Time'] = df.sort_values('Time').groupby('Col A')['Time'].diff().dt.total_seconds()

# Filter dataframe where value of `Col B` is in `["A", "B"]`, group by `Col A` then sort by `Time` using lambda function and calculate row difference in seconds for each group
df['Filtered Lag'] = df['Time'].where(df['Col B'].isin(['B','A'])).groupby(df['Col A']).apply(lambda x: x.sort_values().diff()).droplevel(0).dt.total_seconds()

# Filter dataframe where value of `Col B` is in `["A", "B"]` and value of `Col C` is `1`, group by `Col A` then sort by `Time` using lambda function and calculate row difference in seconds for each group
df['Diff'] = df['Time'].where(df['Col B'].isin(['B','A']) & df['Col C']==1).groupby(df['Col A']).apply(lambda x: x.sort_values().diff()).droplevel(0).dt.total_seconds()

Values for New Col are yet to be calculated, will post the updated answer soon.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2

'How to group by values in a column and find time difference using python?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]