# Chi Square Tests

When samples of size $n$ are taken from a population that follows a standard normal distribution, and the sample variances $s^2$ are found for each sample, the distribution of values of the form $\chi^2 = (n-1)s^2$ follows what is called a chi-square distribution.

It is a family of curves based on degrees of freedom. For a sample of size $n$, there are $n-1$ degrees of freedom.

There are several hypothesis tests associated with the $\chi^2$ distribution, as discussed below.

### Goodness-of-fit Test

In a goodness of fit test, one wishes to decide if the proportions of a population in different categories match some given proportions.

For example, suppose a researcher wanted to know if the number of births are uniformly distributed among the months (i.e., the proportion of births for each month should be 1/12), based on the following number of births seen in one year.

$$\begin{array}{lr|lr} \textrm{Jan} & 34 & \textrm{Jul} & 36\\ \textrm{Feb} & 31 & \textrm{Aug} & 38\\ \textrm{Mar} & 35 & \textrm{Sep} & 37\\ \textrm{Apr} & 32 & \textrm{Oct} & 36\\ \textrm{May} & 35 & \textrm{Nov} & 35\\ \textrm{Jun} & 35 & \textrm{Dec} & 35\\ \end{array}$$

The null hypothesis would be that the proportions in the population match the given proportions for each category, while the alternative hypothesis would be that they do not completely match these proportions.

The test statistic is given by $$\chi^2 = \sum \frac{(O-E)^2}{E}$$ where the sum is taken over all categories, $O$ is the observed frequency in some category, and $E$ is the expected frequency (under the assumption of the null hypothesis). So if a particular category is hypothesized to have proportion $p_i$ and $n$ is the sample size, the value of $E$ for that category would be $n p_i$.

If the assumptions that all $E \ge 5$ are not met, this test should not be performed.

Provided the assumptions are met, the distribution associated with the test statistic should be a chi square distribution with degrees of freedom equal to: $$df = \textrm{number of categories} - 1$$

As fitting the proportions specified in the null hypothesis better than expected is not a reason to reject the null hypothesis, this is a right-tailed test.

### Test for Independence

Suppose one wishes to decide if two categorical variables are independent.

For example, suppose one compares the type of passenger (crew, 1st class, 2nd class, or 3rd class) on the doomed voyage of the Titanic and whether or not they lived or died in its sinking. Compiling the counts in each category given below, one might wonder if some types of passengers were more likely to live than others, or were these two variables independent of one another.

$$\begin{array}{l|cccc|c} & \textrm{Crew} & \textrm{1st Class} & \textrm{2nd Class} & \textrm{3rd Class} & \textrm{Total} \\\hline \textrm{Lived} & 212 & 202 & 118 & 178 & 710\\ \textrm{Died} & 673 & 123 & 167 & 528 & 1491\\\hline \textrm{Total} & 885 & 325 & 285 & 706 & 2201\\ \end{array}$$

The null hypothesis here is that the variables involved are independent, while the alternative hypothesis is that the variables are instead related.

The test statistic is again given by $$\chi^2 = \sum \frac{(O-E)^2}{E}$$ where the sum is taken over all possible combinations of categories (i.e., one for each entry in the table). $O$ represents an observed frequency (a single entry in the table), while $E$ is the expected frequency for the related observation given the null hypothesis.

Note that if $n$ is the sample size (i.e., the grand total for the related table) then $$\begin{array}{rcl} E &=& n \cdot P(\textrm{cell})\\ &=& n \cdot P(\textrm{row}) \cdot P(\textrm{column}) \quad \textrm{ assuming the variables are independent!}\\ &\doteq& n \cdot \frac{\textrm{(row total)}}{n} \cdot \frac{\textrm{(col total)}}{n}\\ &=& \frac{\textrm{(row total)} \textrm{(col total)}}{\textrm{grand total}} \end{array}$$

If the assumptions that all $E \ge 5$ are not met, this test should not be performed.

Provided the assumptions are met, the distribution associated with this test statistic should be a chi square distribution with degrees of freedom equal to the product: $$df = (\textrm{number of rows} - 1) \times (\textrm{number of columns} - 1)$$ As with the goodness-of-fit test, matching the expected frequencies better than anticipated will certainly not give us a reason to reject the null hypothesis, so this is a right-tailed test.

### Test for Homogeneity of Proportions

Suppose one wished to test if more than two populations (or categories) were all found to be in the same proportions.

For example, suppose one wanted to know if the proportions of Democrats, Republicans, and Independents were the same for both men and women, and had collected the following data to this end. $$\begin{array}{l|ccc} & \textrm{Democrat} & \textrm{Republican} & \textrm{Independent}\\\hline \textrm{Male} & 36 & 45 & 24\\ \textrm{Female} & 48 & 33 & 16\\ \end{array}$$

The null hypothesis for this test is the statement that the proportions are the same between populations/categories.

Note, that in the example above, if and only if the variable of gender is independent of the variable of political party affiliation would our expectation for the proportions of Republicans, Democrats, and Independents be the same for men and women.

As the above observation suggests -- except for the way the null hypothesis is stated -- a test for the homogeneity of proportions is absolutely identical to a test for independence.