Processing math: 100%
     

Spearman's Rank Sum Correlation Test

There is a non-parametric test for an association (not necessarily linear) between two variables, called Spearman's Rank Correlation Test that can be used when the assumptions/requirements of the (parametric) correlation test are not satisfied.

The only requirements of this non-parametric test are that the data is paired and the result of a simple random sample, and that the data can be ranked (if they are not ranks already).

Essentially, all this test does is find ranks xi and yi for each pair of Xi and Yi values and then run Pearson's correlation test on these ranks.

Recall that r=sxysxsy=i(xi¯x)(yi¯y)i(xi¯x)2i(yi¯y)2

We denote this value as rS when it is computed from ranks to avoid confusion.

Procedurally, one ranks each sample separately. Then for each pair, one finds the difference of ranks di.

The test statistic rS, when there are no rank ties, can be simplified to

rS=16d2in(n21)

To see this, first note that as there are no ties, the xi's and yi's both consist of the integers from 1 to n, inclusive.

Consequently, we can rewrite the denominator as i(xi¯x)(yi¯y)i(xi¯x)2 Ultimately, the denominator is just a function of n: ni=1(xi¯x)2=ni=1x2i2ni=1xi¯x+ni=1¯x2=[ni=1x2i]2n¯x[ni=1xin]+n¯x2=[ni=1i2]2n¯x2+n¯x2=[ni=1i2]n¯x2=n(n+1)(2n+1)6n(n+12)2=n(n+1)(2n+16n+14)=n(n+1)(8n+4246n+624)=n(n+1)(2n224)=n(n+1)(n1)12=n(n21)12

As for the numerator...

ni=1(xi¯x)(yi¯y)=ni=1xi(yi¯y)ni=1¯x(yi¯y)=ni=1xiyi¯yni=1xi¯xni=1yi+n¯x¯y=[ni=1xiyi]n¯x¯y=[ni=1xiyi]n(n+12)2=[ni=1xiyi]n(n+1)(2n+1)6+n(n21)12=[ni=1xiyi]ni=1x2i+n(n21)12=2ni=1xiyi2ni=1(x2i+y2i)2+n(n21)12=n(n21)12ni=1(x2i2xiyi+y2i)2=n(n21)12ni=1(xiyi)22=n(n21)12ni=1d2i2

Finally, dividing both numerator and denominator by n(n21)/12, we can simplify things to

rs=n(n21)12ni=1d2i2n(n21)12=16d2in(n21)

Critical values can be found in the table below:

Example

Suppose one wishes to use a non-parametric test to test the claim that there is a correlation between one's age and the number of parties they attend in a two-month period, given the following data:

Age16241817232732Parties3254061

First we rank the x's and y's separately:

1532467Age16241817232732Parties32540614365172

Then, for each pair, we find the difference of the ranks and its square.

d3233315d294999125

Now we can calculate the test statistic:

rS=16d2in(n21)=1(6)(66)(7)(491)=0.1786

Seeing this test statistic less in absolute value than the corresponding critical value at α=0.05 given in the table above (i.e., C.V.=0.786), we would fail to reject the null hypothesis, inferring that there is no evidence of a correlation.