To conduct a $t$-test for a claim involving the comparison of two population means based on the means seen in two independent samples, we use the t.test()
function, but with different arguments than those supplied to do a one-sample means test.
R: consider the following example:
Suppose you are interested in comparing, by gender, scores on a certain test. The results of the test for two samples -- one of women, and one of men -- are entered into R as vectors, as shown below. (Note, the max score is 150). Suppose also that we know the populations involved are normally distributed.
> men = c(102,87,101,96,107,101,91,85,108,67,85,82) > women = c(73,81,111,109,143,95,92,120,93,89,119,79,90,126,62,92,77,106,105,111)
We would like to make the assumption that the variances for the scores of men and women are the same, so we ensure that the sample data does not provide any evidence to the contrary. We do this in the usual, way -- with an $F$-test.
> sd(men) [1] 12.0705 > sd(women) [1] 19.94802 > dfn = length(women) - 1 > dfd = length(men) - 1 > F = var(women)/var(men) > p.value = 2*(1-pf(F,dfn,dfd)) > p.value [1] 0.09132068
Alternatively (and much more simply), we could use R's built in test for variance:
> var.test(men,women,alternative="two.sided") F test to compare two variances data: men and women F = 0.36614, num df = 11, denom df = 19, p-value = 0.09132 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1324438 1.1873394 sample estimates: ratio of variances 0.366143
Seeing a $p$-value greater than $\alpha = 0.05$, we have no evidence the variances are different, and proceed with a pooled $t$-test:
> t.test(x=men, y=women, alternative="two.sided", conf.level=0.95, var.equal=TRUE) Two Sample t-test data: men and women t = -0.93758, df = 30, p-value = 0.3559 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -19.016393 7.049727 sample estimates: mean of x mean of y 92.66667 98.65000
Noting the $p$-value of $0.3559$ is greater than the typical significance level of $\alpha = 0.05$, we fail to reject the null hypothesis. There is no significant evidence of a difference in the scores of men versus those of women on this test.
In the event that one is testing dependent/paired data, the t.test
function can be used with as well. Simply set the paired
argument to true
.
As an example, suppose one wanted to test the efficacy of a drug designed to reduce resting heart rates in beats per minute. The resting heart rates of 16 individuals are measured. They are then treated with the drug, and their hearts are again measured. The results are entered into two vectors in R as shown below:
> rate.before = c(52,66,89,87,89,72,66,65,49,62,70,52,75,63,65,61) > rate.after = c(51,66,71,73,70,68,60,51,40,57,65,53,64,56,60,59)
Noting that each "before" measurement is paired with some "after" measurement, we conduct a paired $t$-test by executing the following:
> t.test(x=rate.after, y=rate.before,alternative="less",conf.level=0.95,paired=TRUE)To produce the following results:
Paired t-test data: rate.after and rate.before t = -4.8011, df = 15, p-value = 0.0001167 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -4.721833 sample estimates: mean of the differences -7.4375
Noting the very small $p$-value, we have significant evidence that the heart rates are lower after treatment.