'How to standardize pandas timeseries data according to n-th previous day's mean and variance?
For example, given data that contains readings taken every second, how do we normalize each row of data according to the mean and standard deviation of the previous day's data?
Assume we have the exact (pandas) timestamp of each row in the dataframe.
Solution 1:[1]
Offering up this answer because I was stuck on this problem for awhile and couldnt find any better (and efficient) solutions online.
Assuming that the timestamps are the index of the data:
standardize_offset = 1 # Standardize based off data from 1 day prior
data.index.rename('time', inplace=True)
tmp = data.reset_index()
year_day = tmp.set_index([tmp.time.dt.year, tmp.time.dt.dayofyear - standardize_offset])
groupby = tmp.groupby([tmp.time.dt.year, tmp.time.dt.dayofyear])
mean = groupby.mean()
year_day_mean = mean[mean.index.isin(year_day.index)] # Offset above causes index mismatch
std = groupby.std()
year_day_std = std[std.index.isin(year_day.index)]
standardized = ((year_day - year_day_mean) / year_day_std)
standardized = standardized.reset_index(drop=True)
standardized.loc[:, 'time'] = tmp.time # Time will be all NA values
result = standardized.set_index('time')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
