'Fill missing date and time in Python (pandas)
I have a large data set, a sample is given below. The data is recorded for 1 day with 5-min interval for 24 hours for 3214 unique ids.
- The
time and dateinformation is missing in some rows as shown in the image. Is there a way to automatically fill the missing date and time for everyunique_id? Every unique id has 288 datetime values (5 min * 12 hr * 24 hr =288values per day with 5 min increment in time interval) - Also, how to fill the missing
unqiue idswith previous value? - I am trying to fill the last column
VALUEwith empty cells.
I searched for the relevant answers 1, 2 but it did not work.
The most relent answer was here, but the solutions are only for missing dates not for both date and times.
sample_data.csv
datetime Unique_ID VALUE
28/01/2018 2:05 105714 284
28/01/2018 2:10 105714 292
28/01/2018 2:15 105714 287
28/01/2018 2:20 105714 193
28/01/2018 2:25 105714 2348
28/01/2018 2:30 105714 284
28/01/2018 2:35 105714 286
28/01/2018 2:40 105714 293
28/01/2018 2:45 105714 229
28/01/2018 2:50 105714 2292
28/01/2018 2:55 105714 225
28/01/2018 4:45 105714 185
28/01/2018 4:50 105714 277
28/01/2018 4:55 105714 185
28/01/2018 5:00 105714 2435
28/01/2018 5:05 105714 273
28/01/2018 5:10 105714 2360
28/01/2018 5:15 105714 282
Solution 1:[1]
Here in pandas you can try:
Firstly get the index of nan values:
idx=df[df.isna().all(1)].index
Then use ffill() method:
df[['Unique_ID','datetime']]=df[['Unique_ID','datetime']].ffill()
Finally you can use groupby() method and transform() method:
df['VALUE']=df.groupby(['Unique_ID','datetime'],dropna=False).transform('last')
Now if you want to increment 'datetime' column value by 5 mins which are filled by ffill() method then:
df.loc[idx,'datetime']=df.loc[idx,'datetime']+pd.to_timedelta(5,unit='m')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

