Tech Tips: Two-Sample Means Test

To conduct a $t$-test for a claim involving the comparison of two population means based on the means seen in two independent samples, we use the t.test() function, but with different arguments than those supplied to do a one-sample means test.

R: consider the following example:

Suppose you are interested in comparing, by gender, scores on a certain test. The results of the test for two samples -- one of women, and one of men -- are entered into R as vectors, as shown below. (Note, the max score is 150). Suppose also that we know the populations involved are normally distributed.
```
> men = c(102,87,101,96,107,101,91,85,108,67,85,82)
> women = c(73,81,111,109,143,95,92,120,93,89,119,79,90,126,62,92,77,106,105,111)
```
We would like to make the assumption that the variances for the scores of men and women are the same, so we ensure that the sample data does not provide any evidence to the contrary. We do this in the usual, way -- with an $F$-test.
```
> sd(men)
[1] 12.0705
> sd(women)
[1] 19.94802
> dfn = length(women) - 1
> dfd = length(men) - 1
> F = var(women)/var(men)
> p.value = 2*(1-pf(F,dfn,dfd))
> p.value
[1] 0.09132068
```
Alternatively (and much more simply), we could use R's built in test for variance:
```
> var.test(men,women,alternative="two.sided")

    F test to compare two variances

data:  men and women
F = 0.36614, num df = 11, denom df = 19, p-value = 0.09132
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1324438 1.1873394
sample estimates:
ratio of variances
          0.366143
```
Seeing a $p$-value greater than $\alpha = 0.05$, we have no evidence the variances are different, and proceed with a pooled $t$-test:
```
> t.test(x=men, y=women, alternative="two.sided", conf.level=0.95, var.equal=TRUE)

    Two Sample t-test

data:  men and women
t = -0.93758, df = 30, p-value = 0.3559
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -19.016393   7.049727
sample estimates:
mean of x mean of y
 92.66667  98.65000
```
Noting the $p$-value of $0.3559$ is greater than the typical significance level of $\alpha = 0.05$, we fail to reject the null hypothesis. There is no significant evidence of a difference in the scores of men versus those of women on this test.

In the event that one is testing dependent/paired data, the t.test function can be used with as well. Simply set the paired argument to true.

As an example, suppose one wanted to test the efficacy of a drug designed to reduce resting heart rates in beats per minute. The resting heart rates of 16 individuals are measured. They are then treated with the drug, and their hearts are again measured. The results are entered into two vectors in R as shown below:
```
> rate.before = c(52,66,89,87,89,72,66,65,49,62,70,52,75,63,65,61)
> rate.after = c(51,66,71,73,70,68,60,51,40,57,65,53,64,56,60,59)
```
Noting that each "before" measurement is paired with some "after" measurement, we conduct a paired $t$-test by executing the following:
```
> t.test(x=rate.after, y=rate.before,alternative="less",conf.level=0.95,paired=TRUE)
```
To produce the following results:
```
        Paired t-test

data:  rate.after and rate.before
t = -4.8011, df = 15, p-value = 0.0001167
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -4.721833
sample estimates:
mean of the differences
                -7.4375
```
Noting the very small $p$-value, we have significant evidence that the heart rates are lower after treatment.