'split dataframe entries at midnight
I have a pandas dataframe, with Start and End datatime.
df=pd.DataFrame(data=pd.date_range('20100201', periods=10, freq='5h3min'),columns=['Start'])
df.loc[:,'End']=df.loc[:,'Start']+pd.Timedelta(4,'h')
Start and End can be expected to be sorted interally, but gaps/overlaps may occur between consecutive rows.
I would like to create a new dataframe with the difference that if row contains midnight (e.g. midnight is contained in [Start,End]), the row is then split in two parts before and after midnight
ex:
Start End
0 2010-02-01 00:00:00 2010-02-01 04:00:00
1 2010-02-01 05:03:00 2010-02-01 09:03:00
2 2010-02-01 10:06:00 2010-02-01 14:06:00
3 2010-02-01 15:09:00 2010-02-01 19:09:00
4 2010-02-01 20:12:00 2010-02-02 00:12:00
5 2010-02-02 01:15:00 2010-02-02 05:15:00
should be
Start End
0 2010-02-01 00:00:00 2010-02-01 04:00:00
1 2010-02-01 05:03:00 2010-02-01 09:03:00
2 2010-02-01 10:06:00 2010-02-01 14:06:00
3 2010-02-01 15:09:00 2010-02-01 19:09:00
-----------------------------------------
4 2010-02-01 20:12:00 2010-02-01 23:59:00
5 2010-02-02 00:00:00 2010-02-02 00:12:00
-----------------------------------------
6 2010-02-02 01:15:00 2010-02-02 05:15:00
Solution 1:[1]
I don't believe the above answer works when a midnight time occurs early in the list. Someone correct me if I'm wrong, but I believe anytime you drop the indexes of the "splits" it is then dropping too much from the original list then as well.
I recognize this isn't answering the above question, but I don't have the reputation to comment above. In my case, I believe I will likely just convert to a numpy array, insert rows where the midnight points are, and then copy data accordingly. Ugly, but should work.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jared Bartels |
