'Get first and date after last of continuous period pandas
Sample of dataframe df I have:
date_code item_code vstore_code
1 2022-03-26 11111 N01
2 2022-03-27 11111 N01
3 2022-03-28 11111 N01
4 2022-03-29 11111 N01
5 2022-03-30 11111 N01
6 2022-03-31 11111 N01
7 2022-04-01 11111 N01
8 2022-04-08 11111 N01
9 2022-04-15 11111 N01
10 2022-04-17 11111 N01
11 2022-04-18 11111 N01
12 2022-04-19 11111 N01
13 2022-04-21 11111 N01
14 2022-04-22 11111 N01
15 2022-04-26 11111 N01
16 2022-02-01 22222 N02
17 2022-02-02 22222 N02
18 2022-02-03 22222 N02
19 2022-02-10 22222 N02
There are a lot of items and stores.
I want to create separate dataframe which will containt records of start and end+1 of each contionuous period for each item at each store.
Expected output:
item_code store_code start_period end_period
11111 N01 2022-03-26 2022-04-02
11111 N01 2022-04-08 2022-04-09
11111 N01 2022-04-15 2022-04-16
11111 N01 2022-04-17 2022-04-20
11111 N01 2022-04-21 2022-04-23
11111 N01 2022-04-26 2022-04-27
22222 N02 2022-02-01 2022-02-04
22222 N02 2022-02-10 2022-02-11
Solution 1:[1]
You can aggregate by consecutive datetimes with compare difference by Series.diff for not equal 1 day with Series.cumsum and pass to groupby with aggregate min and max, last add 1 day to end_period column:
df['date_code'] = pd.to_datetime(df['date_code'])
g = df['date_code'].diff().dt.days.ne(1).cumsum()
df = (df.groupby(['item_code','vstore_code',g])
.agg(start_period=('date_code','min'),
end_period=('date_code','max'))
.droplevel(-1)
.reset_index()
.assign(end_period = lambda x: x['end_period'] + pd.Timedelta('1 day'))
)
print (df)
0 11111 N01 2022-03-26 2022-04-02
1 11111 N01 2022-04-08 2022-04-09
2 11111 N01 2022-04-15 2022-04-16
3 11111 N01 2022-04-17 2022-04-20
4 11111 N01 2022-04-21 2022-04-23
5 11111 N01 2022-04-26 2022-04-27
6 22222 N02 2022-02-01 2022-02-04
7 22222 N02 2022-02-10 2022-02-11
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
