'How to improve the performance of Isolation forest on time series?
I'm using Isolation Forest on a time series. I'm trying to classify faults in sensor data, but Isolation forest is not functioning quite like I would like. The picture shows how a lot of the jumps are not detected. 
I am looking for tips on how to improve the fault detection? For instance with features that could be added or parameter changes? I also read somewhere that using rolling windows could help, but I am not sure exactly how?
I'm also interested in general tips when using isolation forests.
Solution 1:[1]
The features you use must reflect the kind of anomalies you wish to detect. In your case, you define anomalies as large jumps in values within a short time span. For this, you should transform the input data into differences - and use that as input to the anomaly detection method. The easiest is to compute difference between consecutive values, using something like numpy.ediff1d. For a more general solution, consider computing windows of a small amount of datapoints, and compute max-min inside each window, and use that as the feature.
Also, for univariate continious problems like this, there are more suitable anomaly methods than Isolation Forest. A simple transform like Z-score or Median Absolute Deviation. In scikit-learn EllipticEnvelope is one alternative.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jon Nordby |
