Hypothesis Tests for Proportions (Two Samples)

Suppose there was a particular characteristic of interest present in two separate populations, and you suspected that the proportions of members with this characteristic in these two populations were different. How could we test that claim?

To make our example concrete, let us suppose we want to know if the proportion of men that are taller than 5.5 ft is greater than the same for women. Taking a sample of 60 women and 45 men, we discover 27 men are taller than 5.5 ft, while only 18 women fall into this category.

Denoting the proportion of men in our sample that are taller than 5.5 ft by $\widehat{p}_m$ and the similar proportion for women in our sample by $\widehat{p}_f$, we have $$\widehat{p}_m = 0.60 \quad \textrm{ and } \quad \widehat{p}_f = 0.30$$ Clearly there is a difference in the sample proportions found -- but is this difference statistically significant?

Sounds like we need to perform a hypothesis test!

Formulating our null and alternative hypotheses, we have

$$H_0 : p_m = p_f \quad \textrm{ and } \quad H_1 : p_m \neq p_f$$ where $p_m$ is the true proportion of men that are taller than 5.5 ft and $p_f$ is the true proportion of women that are taller than 5.5 ft.

Note, we could equivalently express these hypotheses as $$H_0 : p_m - p_f = 0 \quad \textrm{ and } \quad H_1 : p_m - p_f \neq 0$$ Writing the hypotheses in this way allows us to focus on a single distribution in our analysis -- the distribution of differences of sample proportions $\widehat{p}_m - \widehat{p}_f$.

To begin, we need some understanding of the nature of this distribution. Is it normally distributed? What is the mean? What is the standard deviation?

Individually, recall that if $n\widehat{p} \ge 5$ and $n\widehat{q} \ge 5$ then the distribution of sample proportions should be approximately normal. Checking this assumption for both $\widehat{p}_m$ and $\widehat{p}_f$, we find $$\begin{array}{c} n_m \widehat{p}_m = (45)(0.60) = 27 \geq 5\\ n_m \widehat{q}_m = (45)(0.40) = 18 \geq 5\\ n_f \widehat{p}_f = (60)(0.30) = 18 \geq 5\\ n_f \widehat{q}_f = (60)(0.70) = 42 \geq 5 \end{array}$$

So both $\widehat{p}_m$ and $\widehat{p}_f$ are distributed normally.

Conveniently, if $X$ and $Y$ are normally distributed random variables, then $X-Y$ is also normally distributed, with mean and variance given by $\mu_{X-Y} = \mu_X - \mu_Y$ and $\sigma^2_{X-Y} = \sigma^2_X + \sigma^2_Y$.

Recalling that $\widehat{p}_m$ is normally distributed with mean $p_m$ and variance $\frac{p_m q_m}{n_m}$ and $\widehat{p}_f$ is normally distributed with mean $p_f$ and variance $\frac{p_f q_f}{n_f}$, we have the distribution of $\widehat{p}_m - \widehat{q}_f$ normal with mean and standard deviation given by $$\mu_{\widehat{p}_m - \widehat{q}_f} = p_m - p_f \quad \textrm{ and } \quad \sigma_{\widehat{p}_m - \widehat{q}_f} = \sqrt{\frac{p_m q_m}{n_m} + \frac{p_f q_f}{n_f}}$$

Given that the distribution of $\widehat{p}_m - \widehat{q}_f$ is normal, and knowledge of the mean and standard deviation, we could find $z$-scores for any observed differences to measure how unusual they are.

Unfortunately, the standard deviation is not known as it depends on $p_m$ and $p_f$, which are unknown. This is not a big hurdle, however, as we can play the same game as we did with hypothesis tests for a proportion involving a single sample. We can approximate the standard deviation with the standard error: $$SE(\widehat{p}_m - \widehat{p}_f) = \sqrt{\frac{\widehat{p}_m \widehat{q}_m}{n_m} + \frac{\widehat{p}_f \widehat{q}_f}{n_f}}$$

However, we can do even better than that in this situation. There are two proportions in the standard error formula above -- but look at our null hypothesis. It says that the proportions they approximate should be equal. Under the assumption of the null hypothesis then, both $\widehat{p}_m$ and $\widehat{p}_f$ are approximations of the same proportion. Consequently, we can combine them (or "pool" them) into a single best approximation of this common proportion, called the pooled proportion.

In general, whenever we combine data from different sources or different groups because we believe they really came from the same underlying population, it is called pooling.

Denoting the pooled proportion by $\overline{p}$, we have $$\begin{array}{rcl} \overline{p} &=& \displaystyle{\frac{n_m \widehat{p}_m + n_f \widehat{p}_f}{n_m + n_f}}\\\\ &=& \displaystyle{\frac{27 + 18}{60 + 45}}\\\\ &\doteq& 0.4286 \end{array}$$

Replacing the approximations $\widehat{p}_m$ and $\widehat{p}_f$ of $p_m = p_f$ with the better, pooled approximation of $\overline{p}$ and defining $\overline{q} = 1 - \overline{p}$, our standard error becomes $$\begin{array}{rcl} SE(\widehat{p}_m - \widehat{p}_f) &=& \displaystyle{\sqrt{\frac{\overline{p}\overline{q}}{n_m} + \frac{\overline{p}\overline{q}}{n_f}}}\\ &=& \displaystyle{\sqrt{\overline{p}\overline{q} \left(\frac{1}{n_m} + \frac{1}{n_f} \right)}} \end{array}$$

Finally, using this standard error as an approximation to the standard deviation for the normal distribution of $\widehat{p}_m - \widehat{p}_f$ centered at $p_m - p_f$ (hypothesized to be zero), we can find a test statistic ($z$-score) for the difference of proportions seen in our two samples: $$z = \frac{(\widehat{p}_m - \widehat{p}_f) - 0}{\displaystyle{\sqrt{\overline{p}\overline{q} \left(\frac{1}{n_m} + \frac{1}{n_f} \right)}}}$$

Calculating the actual value of this test statistic in this example, the rest of the hypothesis test proceeds in the normal way. $$z \doteq \frac{(0.60 - 0.30) - 0}{0.09759} \doteq 3.07$$ At the $\alpha = 0.05$ significance level, the critical values are $\pm1.96$, so this test statistic clearly falls in the rejection region. Consequently, we reject the null hypothesis that the proportions are the same, making the inference that the proportions of men and women that are taller than 5.5 ft are significantly different.

Confidence Intervals for Differences in Proportions (Two Sample)

Continuing with the example discussed above -- one can build a confidence interval for the true difference $p_m - p_f$ in a similar way.

The best point estimate (and thus, the center of the confidence interval) is not surprisingly $\widehat{p}_m - \widehat{p}_f$.

As for the margin of error, $\displaystyle{E = z_{\alpha/2} \sigma}$, we approximate $\sigma$ with the standard error: $$SE(\widehat{p}_m - \widehat{p}_f) = \sqrt{\frac{\widehat{p}_m \widehat{q}_m}{n_m} + \frac{\widehat{p}_f \widehat{q}_f}{n_f}}$$ Notice that we can't pool the sample proportions in this case, as there is no null hypothesis allowing us to assume $p_m$ and $p_f$ are equal.

Consequently, the confidence interval with a confidence level of $(1-\alpha)$ is given by $$(\widehat{p}_m - \widehat{p}_f) \pm z_{\alpha/2} \sqrt{\frac{\widehat{p}_m \widehat{q}_m}{n_m} + \frac{\widehat{p}_f \widehat{q}_f}{n_f}}$$