In a goodness of fit test, one wishes to decide if the proportions of a population in different categories match some given proportions.

For example, suppose a researcher wanted to know if the number of births are uniformly distributed among the months (i.e., the proportion of births for each month should be 1/12), based on the following number of births seen in one year.

$$\begin{array}{lr|lr} \textrm{Jan} & 34 & \textrm{Jul} & 36\\ \textrm{Feb} & 31 & \textrm{Aug} & 38\\ \textrm{Mar} & 35 & \textrm{Sep} & 37\\ \textrm{Apr} & 32 & \textrm{Oct} & 36\\ \textrm{May} & 35 & \textrm{Nov} & 35\\ \textrm{Jun} & 35 & \textrm{Dec} & 35\\ \end{array}$$The null hypothesis would be that the proportions in the population match the given proportions for each category, while the alternative hypothesis would be that they do not completely match these proportions.

The test statistic is given by $$\sum \frac{(O-E)^2}{E}$$ where the sum is taken over all categories, $O$ is the observed frequency in some category, and $E$ is the expected frequency (under the assumption of the null hypothesis). So if a particular category is hypothesized to have proportion $p_i$ and $n$ is the sample size, the value of $E$ for that category would be $n p_i$.

As fitting the proportions specified in the null hypothesis better than expected is not a reason to reject the null hypothesis, this is a right-tailed test.

Note that the choice of this statistic is reasonable, as when the disagreement between the observed and expected frequencies grows, the test statistic also increases to a value less likely to be seen under the null hypothesis assumption that the expected frequencies are correct.

The big question is: "How is this test statistic distributed?"

To that end, let us consider the following simpler situation:

Instead of considering 12 different categories like the example above -- suppose there are only 2. For convenience (and to be suggestive), let us refer to these categories as "success" and "failure". Suppose we have an expectation that the proportion of successes is $p_1$ and the proportion of failures is $p_2$. Naturally, $p_1 + p_2 = 1$, as we are, of course, just describing a binomial situation.

Let $x_1$ count the number of successes in $n$ trials and let $x_2$ count the number of failures.

Recall that under the right conditions (i.e., an expected number of successes and failures both greater than or equal to $5$), we know that a binomial random variable counting successes seen in $n$ trials where the probability of success is $p$ and the probability of failure is $q=1-p$ can be approximated by a normal distribution with mean $np$ and standard deviation $\sqrt{npq}$.

Thus, $$z = \frac{x_1 - np_1}{\sqrt{n p_1 p_2}}$$ approximates a standard normal distribution.

But then $$\begin{array}{rcl} z^2 &=& \displaystyle{\frac{(x_1 - np_1)^2}{np_1(1-p_1)}}\\\\ &=& \displaystyle{\frac{(x_1 - np_1)^2(1-p_1) + (x_1 - np_1)^2p_1}{np_1(1-p_1)}}\\\\ &=& \displaystyle{\frac{(x_1 - np_1)^2}{np_1} + \frac{(x_1 - np_1)^2}{n(1-p_1)}} \end{array}$$

However, $$(x_1 - np_1)^2 = (n-x_2 - n + np_2)^2 = (x_2 - np_2)^2$$ so,

$$z^2 = \frac{(x_1 - np_1)^2}{np_1} + \frac{(x_2 - np_2)^2}{np_2}$$Recalling $x_1$ and $x_2$ are the *observed* number of successes and failures, respectively, and $np_1$ and $np_2$ are likewise the *expected* number of successes and failures, we have:

Further recalling that the sum of squares of a set of $n$ independent random variables each following a standard normal distribution follows a $\chi_n^2$ distribution with $n$ degrees of freedom -- tells us that this $k=2$ category case of a best-fit test follows a $\chi_1^2$ distribution (i.e., a chi-square distribution with a single degree of freedom).

In a related way, one can argue that in general, for $k$ random variables $x_i$, $i = 1, 2, \ldots, k$, with corresponding expected values $np_i$, the following statistic measuring the "closeness" of the observations to their expectations: $$\frac{(x_1 - np_1)^2}{np_1} + \frac{(x_2 - np_2)^2}{np_2} + \cdots + \frac{(x_k - np_k)^2}{np_k} = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}$$ follows a chi square distribution with $(k-1)$ degrees of freedom.

Importantly, such a conclusion is only valid under the earlier-mentioned "right circumstances". In the two category (i.e., success and failure) case above, we noted the requirement that the expected number of successes and failures needed to be greater than or equal to $5$ so that the underlying binomial distribution would be approximately normal. In the general case, we require that the expected counts in all $k$ categories be greater than or equal to $5$.