To find the correlation coefficient $r$, and conduct a test of the significance of the correlation ...
R:
One can find the correlation coefficient $r$, as defined below, in R by using the cor()
function.
As an example, suppose we are interested in measuring the correlation between the price of pizza and the price of a subway ticket in New York. We compile our observations into two vectors, as shown below:
> pizza = c(0.15,0.35,1.00,1.25,1.75,2.00) > subway = c(0.15,0.35,1.00,1.35,1.50,2.00) > r = cor(pizza,subway) > r [1] 0.9878109
Now, to see if the correlation coefficient is significant, we find the appropriate test statistic (after checking the appropriate assumptions, of course):
> t = r*sqrt((length(pizza)-2)/(1-r^2)) > t [1] 12.69203
Finally, we can compute a $p$-value for the test by using the pt()
function.
> p.value = 2*(1-pt(t,length(pizza)-2)) > p.value [1] 0.0002219544
Seeing a $p$-value substantially smaller than $\alpha = 0.05$, we reject the null hypothesis. The correlation between the price of a slice of pizza and subway fares in New York is highly significant.
Of course, as always, R provides a quicker way to do the above parametric significance of correlation test:> pizza = c(0.15,0.35,1.00,1.25,1.75,2.00) > subway = c(0.15,0.35,1.00,1.35,1.50,2.00) > cor.test(pizza,subway) Pearson's product-moment correlation data: pizza and subway t = 12.692, df = 4, p-value = 0.000222 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.8886647 0.9987251 sample estimates: cor 0.9878109
Notably, this same function can perform a non-parametric Spearman's rank correlation test as well, in the case that the assumptions of Pearson's correlation test are not met.
(In Spearman's test, the $x$ and $y$-values are separately ranked first, and then Pearson's correlation coefficient is computed for the ranks. This coefficient is then used as a test statistic. We'll discuss this test in greater detail later.)
As an example, suppose a consumer group compares ratings of toaster ovens to price for a random sample of ovens, shown below.
$$\begin{array}{r|ccccccc} \hbox{Model} &A&B&C&D&E&F&G\\\hline \hbox{Rating(1-10)} & 3 & 4 & 6 & 5 & 7 & 10 & 9\\ \hbox{Price(\$)}&25&49&30&59&55&35&70\\ \end{array} $$We desire to know whether there is a correlation between ratings and prices. Noting that the ratings are ordinal and Pearson's correlation test has an assumption of ratio or interval level data, we use Spearman's rank correlation test instead:
> rating = c(3,4,6,5,7,10,9) > price = c(25,49,30,59,55,35,70) > cor.test(rating,price,method="spearman") Spearman's rank correlation rho data: rating and price S = 34, p-value = 0.3956 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.3928571