'"Reindex" only fills the first two rows with new values
I am new to stackoverflow. I hope I can formulate my question clearly.
I am using reindex to fill out missing dates in a pandas dataframe:
df = pd.read_csv('myfile.dat', skiprows=1)
print(df)
output:
TIME A B C D
0 2022-04-28 00:02:00 0 2 1 5
1 2022-04-28 00:03:00 0 2 2 5
2 2022-04-28 00:05:00 0 2 3 5
3 2022-04-28 00:06:00 0 2 4 5
4 2022-04-28 00:09:00 0 2 5 5
5 2022-04-28 00:10:00 0 2 6 5
6 2022-04-28 00:12:00 0 2 8 5
7 2022-04-28 00:15:00 0 2 10 5
The doing:
#Change data type to datetime
date_format = '%Y-%m-%d %H:%M:%S'
df['TIME'] = pd.to_datetime(df['TIME'], format=date_format)
#define index and round it (The math. floor() method rounds a number DOWN to the nearest integer)
idx = pd.date_range(start='2022-04-28 00:00:00', end='2022-04-28 00:15:00', freq='60S').floor('60S')
#Set index on 'TIME' from 'df'
df = df.set_index('TIME')
#Use 'resample()' as a convenience method for frequency conversion and resampling of time series
df = df.resample('60S').sum()
#Reindex and setting new values to 0
df = df.reindex(idx, fill_value=1000)
print(df)
Where the ouput is:
A B C D
2022-04-28 00:00:00 1000 1000 1000 1000
2022-04-28 00:01:00 1000 1000 1000 1000
2022-04-28 00:02:00 0 2 1 5
2022-04-28 00:03:00 0 2 2 5
2022-04-28 00:04:00 0 0 0 0
2022-04-28 00:05:00 0 2 3 5
2022-04-28 00:06:00 0 2 4 5
2022-04-28 00:07:00 0 0 0 0
2022-04-28 00:08:00 0 0 0 0
2022-04-28 00:09:00 0 2 5 5
2022-04-28 00:10:00 0 2 6 5
2022-04-28 00:11:00 0 0 0 0
2022-04-28 00:12:00 0 2 8 5
2022-04-28 00:13:00 0 0 0 0
2022-04-28 00:14:00 0 0 0 0
2022-04-28 00:15:00 0 2 10 5
My question is: Why does reindex creates new dates (as it should) but only sets the value of the first two rows to 1000 instead of all new rows?
Thanks for every help!
Solution 1:[1]
Why does reindex creates new dates (as it should) but only sets the value of the first two rows to 1000 instead of all new rows?
Because fill_value parameter of the reindex is the value to use for missing values. Defaults to NaN, but can be any “compatible” value.
I suggest that you just remove the fill_value=1000 and simply assign 1000 to all columns after reindexing.
Solution 2:[2]
If you have a closer look, you will see, after resampling your df the index range is from 02:00 to 15:00 but your created idx has a range from 0:00 to 15:00. The only missing values when reindexing are the first two rows, that's why only these two rows get filled with your defined fill_value
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Eddy Piedad |
| Solution 2 | Rabinzel |
