'How to perform a Levene's test using scipy

I've been trying to use scipy.stats.levene with no success.

I have a numpy matrix with shape (2128, 45100). Each row is a sample and belongs to one of 3 clusters.

I want to test if there is homoscedasticity between clusters.

I've tried filtering my matrix by cluster and sending the params like so:

from scipy.stats import levene

levene(matrixAudioData[np.ix_((cutTree == 0).ravel()),:][0],
       matrixAudioData[np.ix_((cutTree == 1).ravel()),:][0],
       matrixAudioData[np.ix_((cutTree == 2).ravel()),:][0])

ValueError: setting an array element with a sequence.

or even

levene(matrixAudioData)

ValueError: Must enter at least two input sample vectors.

This works:

levene([1,2,3],[2,3,4])

But what if each sample is not just one number ?

Please note that each matrixAudioData[np.ix_((cutTree == 0).ravel()),:][0] that I'm using as parameter has shape (1048, 45100) so it should be fine.

Can you guys point me in any direction ?

Thanks !



Solution 1:[1]

As you have notice levene([1,2,3],[2,3,4]) will work because you are passing array_like objects to the function. But, taking as input matrixAudioData[np.ix_((cutTree == 0).ravel()),:][0] would'nt because your require a 1-D array as input.

For example, consider the next example

col1, col2, col3 = list(range(1, 100)), list(range(50, 78)), list(range(115, 139))

notice that each list has different length because we can perform the statistical test with samples of differents length. Now, to call the leven function we take as input array_like one dimensional objects

statistic, p_value = leven(col1,col2,col3,center="mean")

In this case, p_value=1.3326317740560537e-14. If p_value of the Levene's result is greater than 0.05, it can be assumed as there is homogeneity of variance (HOV). Otherwise, there is no homogeneity present.

So, in this case we can reject the null hypothesis that variance is the same across col1, col2 and col3.

Solution 2:[2]

Based on the Box's M Test formula, here is a Python program for conducting a Box's M Test on two equal sized covariance matrices X0 and X1 (i.e. each have same no. of rows and columns), stored as numpy arrays using the np.cov() function. This has been tested against SPSS output.

Numpy is a dependency, abbreviated to np.

    def box_m(X0,X1):

        global Xp

        m = 2
        k = len(np.cov(X0))
        n_1 = len(X0[0])
        n_2 = len(X1[0])
        n = len(X0[0])+len(X1[0])

        Xp = ( ((n_1-1)*np.cov(X0)) + ((n_2-1)*np.cov(X1)) ) / (n-m)

        M = ((n-m)*np.log(np.linalg.det(Xp))) \
         - (n_1-1)*(np.log(np.linalg.det(np.cov(X0)))) - (n_2-1)*(np.log(np.linalg.det(np.cov(X1))))

        c = ( ( 2*(k**2) + (3*k) - 1 ) / ( (6*(k+1)*(m-1)) ) ) \
            * ( (1/(n_1-1)) + (1/(n_2-1)) - (1/(n-m)) )

        df = (k*(k+1)*(m-1))/2

        c2 = ( ((k-1)*(k+2)) / (6*(m-1)) ) \
            * ( (1/((n_1-1)**2)) + (1/((n_2-1)**2)) - (1/((n-m)**2)) )

        df2 = (df+2) / (np.abs(c2-c**2))

        if (c2>c**2):

            a_plus = df / (1-c-(df/df2))

            F = M / a_plus

        else:

            a_minus = df2 / (1-c+(2/df2))

            F = (df2*M) / (df*(a_minus-M))

        print('M = {}'.format(M))
        print('c = {}'.format(c))
        print('c2 = {}'.format(c2))
        print('-------------------')
        print('df = {}'.format(df))
        print('df2 = {}'.format(df2))
        print('-------------------')
        print('F = {}'.format(F)) 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 EMT
Solution 2 Andy Banks