'Python Kruskal Wallis test reliability?
I have a question about scipy's kruskal wallis test. I recently performed this test over many groups and returned several p values that were completely the same. I also noticed that this test could be performed on strings (?) Here is an example of what I am talking about
In [40]: scipy.stats.kruskal("x","y","z")
Out [40]: KruskalResult(statistic=2.0, pvalue=0.36787944117144245)
As you can see, this just performed the kruskal-wallis test on three letters and returned a p value and a test statistic. How is this possible? Is this test reliable at all?
Solution 1:[1]
For me this makes sense because the Kruskall-Wallis test statistic only involves the ranks of the observations, not their value, and there is an order relation between strings (the lexicographic order), so the ranks make sense. R gives the same p-value as Python for three groups containing only one value, when the three values are distinct:
> kruskal.test(x = c(0, 1, 2), g = 1:3)
Kruskal-Wallis rank sum test
data: c(0, 1, 2) and 1:3
Kruskal-Wallis chi-squared = 2, df = 2, p-value = 0.3679
> kruskal.test(x = c(0, 11, 22), g = 1:3)
Kruskal-Wallis rank sum test
data: c(0, 11, 22) and 1:3
Kruskal-Wallis chi-squared = 2, df = 2, p-value = 0.3679
But R only accepts numeric observations.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Stéphane Laurent |
