'Replace np.nans in list with calculated values obtained from polynomial regression
I have two lists of y values:
y_list1 = [45,np.nan,np.nan,np.nan, 40,50,6,2,7,np.nan, np.nan,np.nan, np.nan, np.nan]
y_list2 = [4,23,np.nan, np.nan, np.nan, np.nan, np.nan,5, np.nan, np.nan, np.nan, np.nan, np.nan]
and both of these values were obtained at a set of time points:
x = np.array([0,3,4,5,6,7,8,9,10,11,12,13,14,15])
The aim: Return y_list1 and y_list2 with the np.nans replaced with values, by fitting a polynomial regression to the data that is there, and then calculating the missing points.
I am able to fit the polynomial:
import sys
import numpy as np
x = np.array([0,3,4,5,6,7,8,9,10,11,12,13,14,15])
id_list = ['1','2']
list_y = np.array([[45,np.nan,np.nan,np.nan, 40,50,6,2,7,np.nan, np.nan,np.nan, np.nan, np.nan],[4,23,np.nan, np.nan, np.nan, np.nan, np.nan,5, np.nan, np.nan, np.nan, np.nan, np.nan]]
for each_id,y in zip(id_list,list_y):
#treat the missing data
idx = np.isfinite(x) & np.isfinite(y)
#fit
ab = np.polyfit(x[idx], y[idx], len(list_y[0]))
So then I wanted to use this fit to replace the missing values in y, so I found this, and implemented:
replace_nan = np.polyval(x,y)
print(replace_nan)
The output is:
[2.13161598e+20 nan nan nan
5.20634185e+19 7.52453405e+20 8.35884417e+09 3.27510000e+04
5.11358666e+10 nan nan nan
nan nan]
test_polyreg.py:16: RankWarning: Polyfit may be poorly conditioned
ab = np.polyfit(x[idx], y[idx], len(list_y[0])) #understand how many degrees
[7.45653990e+07 6.97736286e+16 nan nan
nan nan nan 9.91821285e+08
nan nan nan nan
nan nan]
I'm not concerned about the poor conditioning warning because this is just test data to try understand how it should work, but the output still has nans in it (and didn't use the fit I'd previously generated), could someone should be how to replace the nans in the y values with points estimated from a polynomial regression?
Solution 1:[1]
first you should modify the ab definition as:
ab = np.polyfit(x[idx], np.array(y)[idx], idx.sum())
ab are your polynomial coefficients, so you have to pass them to np.polyval as:
replace_nan = np.polyval(ab,x)
print(replace_nan)
out:
[ 4. 23. 26.54413638 28.01419869 27.00250156
23.10135965 15.90308758 5. -10.01558845 -29.55136312
-54.01500938 -83.81421259 -119.3566581 -161.05003127]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Salvatore Daniele Bianco |
