'False Nearest Neighbors in Python

I have a pandas dataframe that contains multivariate time-series data. One column represents temperature, one column represents humidity, and one column represents wind. For example, a dataframe like below:

       temperature    humidity    wind
0           59           97        8
1           59           89        8
2           58           79        7
3           58           74        7
4           60           74        7
5           62           76        10

Then I want to apply takens embedding (time delay embedding) algorithm on this dataframe. I use Giotto-TDA package to apply takens embedding on the dataframe. The link below shows how this package performs takens embedding on data:
https://giotto-ai.github.io/gtda-docs/latest/modules/generated/time_series/embedding/gtda.time_series.TakensEmbedding.html

Takens Embedding Algorithm takes two inputs that are time_delay and dimension. If we have a univariate time-series, we can perform a heuristic function on time-series data to find the optimal time_delay and dimension for takens embedding algorithm. The heuristic function is available in the link below:
https://giotto-ai.github.io/gtda-docs/latest/modules/generated/time_series/embedding/gtda.time_series.takens_embedding_optimal_parameters.html#gtda.time_series.takens_embedding_optimal_parameters

But when we are working on multivariate time-series data, like my dataframe in above, we should use false nearest neighbors algorithm to find optimal values for time_delay and dimension. But I have not found any available function in python to run false nearest neighbors on my dataframe. I have just found a package named TISEAN that can perform false nearest neighbors on time-series data. The links to false nearest neighbors algorithm in TISEAN package are:
https://www.pks.mpg.de/tisean/TISEAN_2.1/docs/chaospaper/node9.html
and
https://www.pks.mpg.de/tisean//TISEAN_2.1/docs/docs_c/false_nearest.html

But as you can see the package is not written in python and I think is written in C language.

I wanted to know how I can use TISEAN package in python to perform false nearest neighbors on my dataframe? or is there any other way in python except using TISEAN to perform the false nearest neighbors algorithm on my multivariate time-series data?



Solution 1:[1]

A library called teaspoon has an implementation of the false nearest neighbours algorithm. It also has mutual info and takens embedding theorem. Here's the link: https://lizliz.github.io/teaspoon/FNN.html

I do also believe someone wrote TISEAN on github in python. Here is another link: https://github.com/galaunay/pytisean

If you want to use Cao's improved method of FNN, I used R in python, so if you have the required packages you can used this:

import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri

nonlinearTseries = importr("nonlinearTseries")
data = numpy2ri.numpy2rpy(savgol_price)

cao_emb_dim = nonlinearTseries.estimateEmbeddingDim(
    data,  # time series
    len(data),  # number of points to use, use entire series
    62,  # time delay
    20,  # max no. of dimension
    0.95,  # threshold value
    0.1,  # max relative change
    True,  # do the plot
    "Computing the embedding dimension",  # main
    "dimension (d)",  # x_label
    "E1(d) & E2(d)",  # y_label
    ro.NULL,  # x_lim
    ro.NULL,  # y_lim
    1e-5  # add a small amount of noise to the original series to avoid the
          # appearance of false neighbours due to discretization errors.
          # This also prevents the method to fail with periodic signals, 0 for no noise
)

embedding_dimension = int(cao_emb_dim[0])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1