'Finding the best linear section of data

I have some scientific data and wish to find the best region to fit a straight line in. Theoretically, the data should have a constant gradient but other influences effect the data such that there are non-linear sections as shown below

enter image description here

So far I've tried taking the second derivative and locate regions of zero value or having a moving window of 100 points that is fitted and select the region with minimum chi square. However, these haven't been able to select the region correctly. What is a method to select the best region of the data to fit with a straight line?



Solution 1:[1]

This is not, perhaps, an answer but was too long for a comment.

Here's a couple of ideas I've used for similar tasks. I'm supposing we have data (x[],y[]) for i=i..N.

The least squares best fit line, x->a*x+b based on data with indices in S (a subset of {1,,N}) is

a = r; b = my(S) - r*mx(S)

where

r = C(S)/Vx(S)
mx(S) = Sum{ i in S | x[i]}/|S|
my(S) = Sum{ i in S | y[i]}/|S|
Vx(S) = Sum{ i in S | square(x[i]-mx(S))/|S|
Vy(S) = Sum{ i in S | square(y[i]-my(S))/|S|
C(S) = Sum{ i in S | (x[i]-mx(S))*(y[i]-my(S))/|S|

Moreover the 'average chisq' is

Sum{ i in S| square( y[i]-(a*x[i]+b))}/|S| = Vy(S)-C(S)*C(S)/Vx(S)

The fit for a union a union of two disjoint intervals S and T say can be calculated from the fits for S and T

mx(S union T) = (|S|*mx(S) + |T|*mx(T))/(|S|+|T|)
Vx(S union T) =  (|S|/(|S|+|T|))*Vx(S)
                +(|T|/(|S|+|T|))*Vx(T) 
                +|S|*|T|/square( |S|+|T|))*square( mx(S)-mx(T))
my(S union T) = (|S|*my(S) + |T|*my(T))/(|S|+|T|)
Vy(S union T) =  (|S|/(|S|+|T|))*Vy(S)
                +(|T|/(|S|+|T|))*Vy(T) 
                +|S|*|T|/square( |S|+|T|))*square( my(S)-my(T))
C(S union T) =  (|S|/(|S|+|T|))*C(S)
                +(|T|/(|S|+|T|))*C(T) 
                +|S|*|T|/square( |S|+|T|))*(mx(S)-mx(T))*(my(S)-my(T))

So, for example, if you follow Sembei's suggestion combining adjacent (or other) fits can be done very efficiently.

All of the formulae above can be derived with straightforward (though tedious) algebra.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dmuir