'Replace np.nans in list with calculated values obtained from polynomial regression

I have two lists of y values:

y_list1 = [45,np.nan,np.nan,np.nan, 40,50,6,2,7,np.nan, np.nan,np.nan, np.nan, np.nan]

y_list2 = [4,23,np.nan, np.nan, np.nan, np.nan, np.nan,5, np.nan, np.nan, np.nan, np.nan, np.nan]

and both of these values were obtained at a set of time points:

x = np.array([0,3,4,5,6,7,8,9,10,11,12,13,14,15])

The aim: Return y_list1 and y_list2 with the np.nans replaced with values, by fitting a polynomial regression to the data that is there, and then calculating the missing points.

I am able to fit the polynomial:

import sys
import numpy as np

x = np.array([0,3,4,5,6,7,8,9,10,11,12,13,14,15])

id_list = ['1','2']
list_y = np.array([[45,np.nan,np.nan,np.nan, 40,50,6,2,7,np.nan, np.nan,np.nan, np.nan, np.nan],[4,23,np.nan, np.nan, np.nan, np.nan, np.nan,5, np.nan, np.nan, np.nan, np.nan, np.nan]]

for each_id,y in zip(id_list,list_y):

        #treat the missing data
        idx = np.isfinite(x) & np.isfinite(y)

        #fit
        ab = np.polyfit(x[idx], y[idx], len(list_y[0])) 

So then I wanted to use this fit to replace the missing values in y, so I found this, and implemented:

         replace_nan = np.polyval(x,y)
         print(replace_nan)

The output is:

[2.13161598e+20            nan            nan            nan
 5.20634185e+19 7.52453405e+20 8.35884417e+09 3.27510000e+04
 5.11358666e+10            nan            nan            nan
            nan            nan]
test_polyreg.py:16: RankWarning: Polyfit may be poorly conditioned
  ab = np.polyfit(x[idx], y[idx], len(list_y[0])) #understand how many degrees
[7.45653990e+07 6.97736286e+16            nan            nan
            nan            nan            nan 9.91821285e+08
            nan            nan            nan            nan
            nan            nan]

I'm not concerned about the poor conditioning warning because this is just test data to try understand how it should work, but the output still has nans in it (and didn't use the fit I'd previously generated), could someone should be how to replace the nans in the y values with points estimated from a polynomial regression?



Solution 1:[1]

first you should modify the ab definition as:

ab = np.polyfit(x[idx], np.array(y)[idx], idx.sum())

ab are your polynomial coefficients, so you have to pass them to np.polyval as:

replace_nan = np.polyval(ab,x)
print(replace_nan)

out:

[   4.           23.           26.54413638   28.01419869   27.00250156
   23.10135965   15.90308758    5.          -10.01558845  -29.55136312
  -54.01500938  -83.81421259 -119.3566581  -161.05003127]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Salvatore Daniele Bianco