Non-Parametric Tests

Wilcoxon Rank Sum Test

The Wilcoxon Rank Sum test is a non-parametric hypothesis test where the null hypothesis is that there is no difference in the populations (i.e., they have equal medians).

This test does assume that the two samples are independent, and both $n_1$ and $n_2$ are at least $10$. It should not be used if either of these assumptions are not met.

The test involves first ranking the data in both samples, taken together. Each data element is given a rank, $1$ through $n_1 + n_2$, from lowest to highest -- with ties resolved by ranking tied elements arbitrarily at first, and then replacing rankings of tied elements with the average rank of those tied elements.

So for example, ranking the data below $$\begin{array}{l|cccccccc} \textrm{Sample A} & 12 & 15 & 17 & 18 & 18 & 20 & 23 & 24\\\hline \textrm{Sample B} & 14 & 15 & 18 & 20 & 20 & 20 & 24 & 25\\ \end{array}$$ results in the following ranks $$\begin{array}{ccc} \textrm{value} & \textrm{initial rank} & \textrm{final rank}\\\hline 12 & 1 & 1\\ 14 & 2 & 2\\ 15 & 3 & 3.5\\ 15 & 4 & 3.5\\ 17 & 5 & 5\\ 18 & 6 & 7\\ 18 & 7 & 7\\ 18 & 8 & 7\\ 20 & 9 & 10.5\\ 20 & 10 & 10.5\\ 20 & 11 & 10.5\\ 20 & 12 & 10.5\\ 23 & 13 & 13\\ 24 & 14 & 14.5\\ 24 & 15 & 14.5\\ 25 & 16 & 16\\ \end{array}$$

Suppose $n_1$ denotes the size of the smaller sample and $n_2$ denotes the size of the other sample. Now define the following: $$\mu_R = \frac{n_1(n_1+n_2+1)}{2} \quad \textrm{ and } \quad \sigma_R = \sqrt{\frac{n_1 n_2 (n_1 + n_2 + 1)}{12}}$$ If $R$ is the sum of the ranks associated with elements from the sample of size $n_1$, then $$z = \frac{R - \mu_R}{\sigma_R}$$ is a test statistic that follows a standard normal distribution.

Kruskal-Wallis Test (i.e., H Test)

The Kruskal-Wallis Test can be used to test the claim (a null hypothesis) that there is no difference in the populations (i.e., they have equal medians) when there are 3 or more independent samples, provided they meet the additional assumption that the sample sizes are all at least 5.

To perform the test, we first rank all of the samples together, and then add the ranks associated with each sample.

Letting $R_i$ be the sum of the ranks for sample $i$, of size $n_i$, $N$ be the sum of all sample sizes $n_i$, and $k$ be the number of samples, the following test statistic

$$H = \frac{12}{N(N+1)}\left(\sum \frac{R^2_i}{n_i} \right) - 3(N+1)$$ follows a $\chi^2$ distribution with $k-1$ degrees of freedom.

This is a right-tailed test