'Python interp function that returns first/leftmost match?
I am given something like selected percentile values (5th, 10th, 25th, 50th) and so on, and need to find what percentile a given value is. So I have tried scipy and numpy, but have come across a problem. It is not uncommon for multiple percentiles to have the same value (for example a value of 0 all the way until the 50th percentile). When I interpolate, it always returns the highest value, which introduces a skew into my bulk stats. I have a quick example below. X would be percentile values, Y is the corresponding percentiles. 0.0 is a value I would be interpolating. It seems the interpolation function and method is fairly limited since I have repeating x values.
x=[0.0,0.0,0.0,0.0,0.05,0.2,0.5]
y=[5,10,25,50,75,90,95]
interp = interp1d(x, y, kind='slinear', fill_value='extrapolate')
z2 = np.interp(0.0, x, y, left=0, right=100).round(1)
z = interp(0.0)
print(z)
print(z2)
In this case, both z and z2 return 50.0, when I expect/want 0.0 or 5.0 (depending on extrapolation). Is there anyway to force these to return the minimum possible value, the middle possible value, or any other way to accomplish this?
Solution 1:[1]
Both np.interp() and scipy.interpolate.interp1d() require that the x values must be strictly increasing (i.e. x[i+1] > x[i]), and may return nonsense if they aren't. If you want some specific behavior, you need to preprocess your data to get rid of any repeated x values. For example:
# assuming x and y are already sorted
x_fixed, indices = np.unique(x, return_index=True)
y_fixed = [np.min(vals) for vals in np.split(y, indices[1:])]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | yut23 |
