'Finding standard deviations along x and y in 2D numpy array
If I have a 2D numpy array composed of points (x, y) that give some value z(x, y) at each point, can I find the standard deviation along the x-axis and along the y-axis? I know that np.std(data) will simply find the standard deviation of the entire dataset, but that's not want I want. Also, adding in axis=0 or axis=1 computes the standard deviations along each axis for as many rows or columns that you have. If I just want one standard deviation along the y-axis, and another along the x-axis, can I find these in a dataset like this? From my understanding, standard deviations along x and y normally make sense when you have points x with values y(x). But I need some sigma_x and sigma_y for a 2D Gaussian fit I'm trying to do. Is this possible?
Here is an oversimplified example, since my actual data is much larger.
import numpy as np
data = np.array([[1, 5, 0, 3], [3, 5, 1, 1], [41, 33, 9, 20], [11, 20, 4, 13]])
print(np.std(data)) #not what I want
>>> 11.78386
print(np.std(data, axis=0)) #this gives me as many results as there are rows/columns so it's not what I want
>>> [16.03 11.69 3.5 7.69]
I'm not sure how the output corresponding to what I want would look like, since I'm not even sure if it's possible in a 2D array with shape > nx2. But I want to know if it's possible to compute a standard deviation along the x-axis, and one along the y-axis. I'm not even sure if this makes sense for a 2D array... But if it doesn't, I'm not sure what to input as my sigma_x and sigma_y for a 2D Gaussian fit.
Solution 1:[1]
Standard deviation doesn't care whether y = f(x) or (x, y) are coordinates. It just measures how spread a set of values are. If you have n points (x, y) which make up a nX2 size array, then the std(axis=0) is what you want. It creates a (2, )shaped array, where the first elements is the x-axis std, and the second the y-axis std. Whether that is useful, depends on what you want, and it ignores the correlation between x and y.
Solution 2:[2]
I think what you want is to separate the x axis in small intervals and compute the standard deviation of the y coordinates of the points within those intervals. You could compute std(y_i), where y_i are the y coordinates for points x in the interval (x_min+i*delta_x, x_min+(i+1)*delta_x), choosing a small delta_x, such that enough points (x_j, y_j) lie within the interval.
import numpy as np
x = np.array([0, 0.11, 0.1, 0.01, 0.2, 0.22, 0.23])
y = np.array([1, 2, 3, 2, 2, 2.1, 2.2])
num_intervals = 3
#sort the arrays
sort_inds = np.argsort(x)
x = x[sort_inds]
y = y[sort_inds]
# create intervals
x_range = x.max() - x.min()
x_intervals = np.linspace(np.min(x)+x_range/num_intervals, x.max()-x_range/num_intervals, num_intervals)
print(x_intervals)
>> [0.07666667 0.115 0.15333333]
Next, we split the arrays y and x using these intervals:
# get indices of x where the elements of x_intervals
# should be inserted, in order to maintain the order
# for sufficiently large num_intervals it
# approximates the closest value in x to an element
# in x_intervals
split_indices = np.unique(np.searchsorted(x, x_intervals, side='left'))
ls_of_arrays_x = np.array_split(x, split_indices)
ls_of_arrays_y = np.array_split(y, split_indices)
print(ls_of_arrays_x)
print(ls_of_arrays_y)
>> [array([0. , 0.01]), array([0.1 , 0.11]), array([0.2 , 0.22, 0.23])]
>> [array([1., 2.]), array([3., 2.]), array([2. , 2.1, 2.2])]
Now compute the x coordinates and the corresponding y std:
y_stds = np.array([np.std(yi) for yi in ls_of_arrays_y])
x_mean = np.array([np.std(xi) for xi in ls_of_arrays_x])
print(x_mean)
print(y_stds)
>> [0.005 0.105 0.21666667]
>> [0.5 0.5 0.08164966]
I hope it was what you were looking for.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | blue_note |
| Solution 2 |
