'Having issues doing a spline interpolation between columns using Pandas
I have the following dataframe
index Flux_min New_Flux Flux_max
0 0 0.550613 NaN 0.537315
1 1 0.656621 NaN 0.620647
2 2 0.756486 NaN 0.700822
3 3 0.846038 NaN 0.775749
4 4 0.920871 NaN 0.843257
... ... ... ... ...
99997 99997 0.874460 NaN 0.805594
99998 99998 0.801958 NaN 0.743039
99999 99999 0.721355 NaN 0.676436
100000 100000 0.635054 NaN 0.606967
100001 100001 0.552247 NaN 0.535789
What I want to do is to perform a spline interpolation (or any other kind of interpolation that is not linear) between the Flux_min and Flux_max columns in order to figure out the values for the New_Flux column.
This is my code to do that
synth_spectra_df = synth_spectra_df.interpolate(method='spline', order=2, axis=1)
where synth_spectra_df is the dataframe shown above. If I run this code, I get the following error:
Traceback (most recent call last):
File "spectral_interpolation.py", line 107, in <module>
synth_spectra_df = synth_spectra_df.interpolate(method='spline', order=2, axis=1)
File "/home/fmendez/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 7209, in interpolate
raise ValueError(
ValueError: Index column must be numeric or datetime type when using spline method other than linear. Try setting a numeric or datetime index column before interpolating.
I already made sure that all the columns are the numeric data type, and also tried to not have the index as a column, but I still get the same error. If I do the linear interpolation it doesn't complain, but I'm not interested in a linear interpolation.
Any help would be very appreciated
Solution 1:[1]
In order to interpolate, you need to have values in a column. In particular, if you read the documentation of that method, you will see an example where they explicitly say that an element can't be filled in due to it not having preceding entries.
Note how the first entry in column ‘b’ remains NaN, because there is no entry before it to use for interpolation.
This is from the documentation:
df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
(np.nan, 2.0, np.nan, np.nan),
(2.0, 3.0, np.nan, 9.0),
(np.nan, 4.0, -4.0, 16.0)],
columns=list('abcd'))
df
a b c d
0 0.0 NaN -1.0 1.0
1 NaN 2.0 NaN NaN
2 2.0 3.0 NaN 9.0
3 NaN 4.0 -4.0 16.0
df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d
0 0.0 NaN -1.0 1.0
1 1.0 2.0 -2.0 5.0
2 2.0 3.0 -3.0 9.0
3 2.0 4.0 -4.0 16.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jacob Sushenok |
