'Find the best result in data- scatter plot - python [closed]
I have a set of data, the scatter plot of data is something like this:
I've shown the correct answer by a red area, it's almost in the center of the two branches. (The scatter plot is 'V' form) I need an algorithm for finding this area and collecting all scatter data which contained in this area. (because there are another set data like this) Both x,y data have been uploaded here: Data
Solution 1:[1]
Based on your question so far, it is difficult to know how to evaluate what is correct(ie. why is this region correct? Is the based on values/coordinates of points, on point density in the region? Is it based on the position with respect to the larger structure(ie. centre of the branches) etc.).
That being said; there are a lot of machine learning algorithms available; eg. scikit-learn for python. Using a supervised learning algorithm you could train the solver on some data, then it could (try to) find the correct answer for other data.
More of an answer is difficult to provide before you rephrase your question.
If all your data looks like this, one option might be to do a PCA(ie, dimensional reduction) on the data to separate the branches into two clusters. You would then get some datapoints which can not clearly be identified as belonging to only one branch, which you could then select (scikit-learn's PCA docs). Note that while it should be reasonably accurate, you would never get a perfect circle using this.
If you only need it for this one dataset, which you already know the "radius" and centre of, you could identify a centre of your circle(ellipse) with its semi-major(& minor) a (& b) axes and then compute the distance using its canonical form.
It might then be simpler to use a square, though.
So it would look something like this(assuming 1d numpy.ndarrays):
#selecting points in a square
condition=(xarr>xmin) & (xarr<xmax) & (yarr>ymin) & (yarr<ymax)
#depending on what you want, coordinates or value at coordinates
xsq=xarr[condition]
ysq=yarr[condition]
squaredata=data[condition]
#for ellipse:
#x0, y0, a and b can be preset if only this function.
in_ellipse=np.vectorize(\
lambda x,y,x0,y0,a,b: np.sqrt(((x-x0)/a)**2 + ((y-y0)/b)**2)<=1.0)
ellipsedata=data[in_ellipse(xarr,yarr,1.6,-1125,0.1,10)]
x_ellipse=xarr[in_ellipse(xarr,yarr,1.6,-1125,0.1,10)]
y_ellipse=yarr[in_ellipse(xarr,yarr,1.6,-1125,0.1,10)]
The values for x0,y0, a and b were just estimated up by looking at the picture.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
