'Polynomial Regression of a Noisy Dataset
I was wondering if I could get some help on a problem.
I am creating a tool for a former lab of mine which uses data from a physics based machine (a lot of noise) that results as simple x, y coordinates. I want to identify local maximums of the dataset, however, since there is a bunch of noise in the set, you cannot just check the slope between the points in order to determine the peak.
In order to solve this, I was thinking of using polynomial regression to somewhat "smooth out" the data set, then determine local maximums from the resulting model.
I've run through this link http://scikit-learn.org/stable/auto_examples/linear_model/plot_polynomial_interpolation.html, however, it only tells you how create a model that is a close fit. It doesn't tell you if there is an integrated metric in which to measure which is the best model. Should I do this through Chi squared? Or is there some other metric that works better or is integrated into the scikit-learn kit?
Solution 1:[1]
Link procided esentially shows you how to build a Ridge Regression on top of polynomial features. Consequently this is not a "tight fit", as you can control it through regularization (alpha parameter) - prior over the parameters. Now, what do you mean by "best model" - there are infinitely many possible criterions for being a best regression, each tested through different criterion. You need to answer yourself - what is the measure that you are interested in. Should it be some kind of "golden ratio" between smoothness and close fitness? Or maybe you want a model of at most some smoothness, which minimizes some error measure (mean squared distance to the points?)? Yet another would be to test how well it captures the underlying process - through some kind of typical validation (like cross validation etc.) where you repeat building the model on the subset of the data and check error on the holdout part. There are many possible (and completely valid!) approaches - everything depends on the exact question you want to answer. "What is the best model" is not a good question, unfortunately.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | lejlot |
