'How to improve the performance of Isolation forest on time series?

I'm using Isolation Forest on a time series. I'm trying to classify faults in sensor data, but Isolation forest is not functioning quite like I would like. The picture shows how a lot of the jumps are not detected. Image

I am looking for tips on how to improve the fault detection? For instance with features that could be added or parameter changes? I also read somewhere that using rolling windows could help, but I am not sure exactly how?

I'm also interested in general tips when using isolation forests.



Solution 1:[1]

The features you use must reflect the kind of anomalies you wish to detect. In your case, you define anomalies as large jumps in values within a short time span. For this, you should transform the input data into differences - and use that as input to the anomaly detection method. The easiest is to compute difference between consecutive values, using something like numpy.ediff1d. For a more general solution, consider computing windows of a small amount of datapoints, and compute max-min inside each window, and use that as the feature.

Also, for univariate continious problems like this, there are more suitable anomaly methods than Isolation Forest. A simple transform like Z-score or Median Absolute Deviation. In scikit-learn EllipticEnvelope is one alternative.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jon Nordby