'Different results from interpolation if (same data) is done with timeindex

I get different results from interpolation if (same data) is done with timeindex, how can that be? On pandas docs it says:

The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods 
are wrappers around the respective SciPy implementations of similar names. 
These use the actual numerical values of the index. For more information 
on their behavior, see the SciPy documentation and SciPy tutorial.

the sub-methods in interpolation( method= ...), where i noticed this strange behavior are (among others):

 ['krogh', 'spline', 'pchip', 'akima', 'cubicspline']

reproducable sample (with comparison):

import numpy as np , pandas as pd
from math import isclose

# inputs:
no_timeindex       = False  # reset both dataframes indices to numerical indices # for comparison.
no_timeindex_for_B = True   # reset only dataframe indices  of the first approach to numerical indices, the other one stays datetime, for comparison.
holes              = True   # create date-timeindex that skips the timestamps, that would normally be at location 6,7,12, 14, 17, instead of a perfectly frequent one.
o_                 = 2 # order parameter for interpolation.
method_            = 'cubicspline'

#------------------+

n = np.nan
arr = [n,n,10000000000 ,10,10,10000,10,10, 10,40,4,4,9,4,4,n,n,n,4,4,4,4,4,4,18,400000000,4,4,4,n,n,n,n,n,n,n,4,4,4,5,6000000000,4,5,4,5,4,3,n,n,n,n,n,n,n,n,n,n,n,n,n,4,n,n,n,n,n,n,n,n,n,n,n,n,n,n,2,n,n,n,10,1000000000,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,1,n,n,n,n,n,n,n,n,n]

#--------------------------------------------------------------------------------+
df = pd.DataFrame(arr) # create dataframe from array.

if holes: # create a date-timeindex that skips the timestamps, that would normally be at location 6,7,12, 14, 17.
    ix       = pd.date_range("01.01.2000", periods = len(df)+(2 +5), freq="T")[2:]
    to_drop  = [ix[6],ix[7],ix[12],ix[14],ix[17]]
    ix       = ix.drop( to_drop)
    df.index = ix
else: # create a perfectly frequent datetime-index without any holes.
    ix       = pd.date_range("01.01.2000", periods = len(df)+2, freq="T")[2:]
    df.index = ix
 
# if wanted, drop timeindex and set it to integer indices later
if no_timeindex == True:
    df.reset_index( inplace=True, drop=True )  
 
df = df.interpolate(method=method_, order=o_, limit_area = 'inside') # interpolate.

df.index = ix # set index equal to the second approach, for comparing later.

A = df.copy(deep=True) # create a copy, to compare result with second approach later.

#------------------------------+ 
# second approach with numerical index instead of index-wise
 
df = pd.DataFrame(arr) # create dataframe from array.

if holes: # create a date-timeindex that skips the timestamps, that would normally be at location 6,7,12, 14, 17.
    ix       = pd.date_range("01.01.2000", periods = len(df)+(2 +5), freq="T")[2:]
    to_drop  = [ix[6],ix[7],ix[12],ix[14],ix[17]]
    ix       = ix.drop( to_drop)
    df.index = ix
else: # create a perfectly frequent datetime-index without any holes.
    ix       = pd.date_range("01.01.2000", periods = len(df)+2, freq="T")[2:]
    df.index = ix
      
# if wanted, drop timeindex and set it to integer indices later
if no_timeindex == True or no_timeindex_for_B == True:
    df.reset_index(inplace=True, drop=True)    
     
df = df.interpolate(method=method_, order=o_, limit_area = 'inside') # interpolate.

df.index = ix # set index equal to the first approach, for comparing later.

B = df.copy(deep=True) # create a copy, to compare result with second approach later.
    
#--------------------------------------------------------------------------------+

# compare:
if A.equals(B)==False: 
    
    # if values arent equal, count the ones that arent.
    i=0
    for x,y in zip( A[A.columns[0]], B[B.columns[0]]):
        if x!=y and not (np.isnan(x) and np.isnan(y) ) :
            print( x, " ?= ", y," ", (x==y), abs(x-y)) 
            i+=1
    
    # if theres no different values,  ...
    if    i==0:                 print(" both are the same. ")
         
    else: # if theres different values, ...
        
        # count those different values, that are NOT almost the same.
        not_almost = 0
        for x,y in zip( A[A.columns[0]], B[B.columns[0]]):
            if not (np.isnan(x) and np.isnan(y) ) :
                if isclose(x,y, abs_tol=0.000001) == False:  
                    not_almost+=1
        
        # if all values are almost the same, ...
        if not_almost == 0:     print(" both are not, but almost the same. ")
        else:                   print(" both are definetly not the same. ") 
else:                           print(" both are the same. ")

This shouldnt be the case, since the pandas docs state different. Why does it happen anyways?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Different results from interpolation if (same data) is done with timeindex

Sources

Related Questions