'Interpolate pandas dataframe around certain value
I have a dataset showing water levels over time and I want to plot all the data above a certain value (-0.75m in the example) in green and all the data below this value in orange. The problem I am facing is that whenever my data crosses over my value, the plotted line stops at that value and there are multiple gaps in my plot
What I want to do is interpolate any time the data crosses this border so that my line will continue to the level of -0.75m in green and become orange from there on out.
I have tried to find out at which spots my data crosses this line and have inserted a row in my dataset with a y-value of -0.75 to later on interpolate the corresponding date in my dataframe but this has not worked yet so far.
Below is an example code where I make my own dataset and try to interpolate whenever I cross the value 2. This does seem to work for the trial dataset but not for my original data and the way in which I get the code to work seems very sketchy to me. Are there better ways of trying to achieve my goal?
d = {'x': ['2020-03-14', '2020-03-15', '2020-03-16', '2020-03-18', '2020-03-19'], 'y': [3, 4, 5, -1, 1]}
df = pd.DataFrame(data=d)
df.set_index('x', inplace = True)
empty = {'y' : 2}
df_empty = pd.DataFrame(data=empty, index=[np.nan])
df_temp = df.copy()
df_temp.y -= 2
df_new = df.iloc[[0]]
# Add nan row in dataframe
for i in range(len(df_temp) -1):
df_new = pd.concat([df_new, df.iloc[[i]]])
if df_temp.y.iloc[i] * df_temp.y.iloc[i+1] < 0:
df_new = pd.concat([df_new, df_empty])
# Polish new dataframe
df_new = df_new = pd.concat([df_new, df.iloc[[-1]]])
df_new = df_new.iloc[1:]
# set desired values as index
df_new.reset_index(inplace = True)
df_new.set_index('y',inplace = True)
# convert dates to numbers
df_new.iloc[:,0] = pd.to_numeric(pd.to_datetime(df_new.iloc[:,0]))
# set negative numbers (the missing dates) to nan
df_new[df_new < 0] = np.nan
# interpolate nan values
df_new.iloc[:,0].interpolate(method = 'linear', inplace = True)
# convert back to datetime
df_new.iloc[:,0] = pd.to_datetime(df_new.iloc[:,0])
# undo index change
df_new.reset_index(inplace = True)
df_new.set_index('index',inplace = True)
df.plot()
df_new.plot()
Solution 1:[1]
First, if you have to iterate over data, using NumPy
is faster than using Pandas
.
See: https://towardsdatascience.com/how-to-make-your-pandas-loop-71-803-times-faster-805030df4f06
Second, you said "This does seem to work for the trial dataset but not for my original data". In this case, you should provide original data for your question.
Anyway, here is a working code that uses NumPy
. I am not sure that this will work for your original data.
import datetime
import numpy as np
import matplotlib.pyplot as plt
# Raw data
list_dtstr = ['2020-03-14', '2020-03-15', '2020-03-16', '2020-03-18', '2020-03-19', '2020-03-20', '2020-03-21', '2020-03-22']
list_value = [3.0, 4.0, 5.0, -1.0, 1.0, 4.0, -1.0, 3.0]
npdata = np.array([[datetime.datetime.strptime(dtstr, '%Y-%m-%d').timestamp() for dtstr in list_dtstr], list_value])
npdata = npdata.transpose()
# Threshold value
threshold = 2.0
# Interpolate
for idx in range(len(npdata) - 1 , 0, -1):
if (npdata[idx, 1] - threshold) * (npdata[idx - 1, 1] - threshold) < 0:
interp_x = [npdata[idx - 1, 1], npdata[idx, 1]]
interp_y = [npdata[idx - 1, 0], npdata[idx, 0]]
# Sort interpolation data
if interp_x[0] > interp_x[1]:
interp_x = [interp_x[1], interp_x[0]]
interp_y = [interp_y[1], interp_y[0]]
# Interpolation
dt_value = np.interp(threshold, interp_x, interp_y)
# Insert interpolated data
npdata = np.insert(npdata, idx, [dt_value, threshold], axis=0)
# Convert timestamp to datetime
npdata_new = np.array([[datetime.datetime.fromtimestamp(npdata[idx, 0]), npdata[idx, 1]] for idx in range(len(npdata))])
# Split data
npdata_above = npdata_new.copy()
npdata_below = npdata_new.copy()
for idx in range(len(npdata_new)):
if npdata_new[idx, 1] > threshold:
npdata_below[idx, 1] = None
elif npdata_new[idx, 1] < threshold:
npdata_above[idx, 1] = None
# Plot
plt.plot(npdata_above[:, 0], npdata_above[:, 1], c = 'green')
plt.plot(npdata_below[:, 0], npdata_below[:, 1], c = 'orange')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |