# Confidence Intervals and Hypothesis Tests for Means

Suppose we were given a sample with mean of $\overline{x} = 5$ and a standard deviation of $s = 2$, and asked to estimate the mean $\mu$.

Certainly, the best point estimate of $\mu$ is $\overline{x}$. However, if we want a confidence interval for $\mu$, we will need to consider the spread of the sampling distribution for the mean.

Recall the Central Limit Theorem tells us that $$\mu_{\overline{x}} = \mu \quad \textrm{ and } \quad \sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}}$$

As such, under an assumption that the distribution of sampling means is approximately normal, we can construct a confidence interval for $\mu$ with confidence level of $(1 - \alpha)$ between $$\overline{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$$ This begs two questions:

1. "How will we know the assumption that the distribution of sample means is approximately normal has been met?"
2. "What should we do if we don't know the value of $\sigma$?"

Supposing that $\sigma$ is known for the moment, recall the Central Limit Theorem tells us that the distribution of sample means gets more and more normal as the sample size increases.

Thus, if $n$ is sufficiently large (let's say $n \ge 30$), then we can rely on the fact that the distribution of sample means should be approximately normal.

If $n \lt 30$, then knowing that the original distribution was approximately normal should be enough to assure us that the distribution of sample means is normal.

If $\sigma$ is unknown, however, we are forced to approximate its value with the sample standard deviation, $s$. This in turn, causes the nature of the associated distribution to change from a standard normal distribution to a $t$-distribution.

Thus, if $\sigma$ is unknown, and $n \ge 30$ or the original population is approximately normal, the confidence interval for the mean with confidence level of $1-\alpha$ falls between $$\overline{x} \pm t_{\alpha/2} \frac{s}{\sqrt{n}}$$ where $t_{\alpha/2}$ is taken from a $t$-distribution with $n-1$ degrees of freedom.

### How Large Should the Sample Size Be?

Just like with proportions, one can assure a certain margin of error by choosing a large enough sample size. To determine what that sample size might be, simply solve for $n$ in the formula for $E$ used -- always rounding up to ensure $n$ is large enough.

So, in the case of $\displaystyle{E = z_{\alpha/2} \frac{\sigma}{\sqrt{n}}}$, we would have $$n = \left( \frac{z_{\alpha/2} \sigma}{E} \right)^2$$ Of course, calculating this $n$ requires knowledge of $\sigma$. We can use the sample standard deviation from a previous study if it is available. Also, as a rule of thumb, $$\sigma \approx \frac{\textrm{range}}{4}$$ Last, but not least, one should also consider the return rate when designing a study -- it can be very difficult to get a 100% return rate, so without a little "padding" one might not get the margin of error one desires, even when everything else was done perfectly.

### Hypothesis Tests

Many of the same considerations made in the construction of confidence intervals for means apply in hypothesis tests for means:

If $\sigma$ is known and $n \gt 30$ or the distribution is approximately normal, then we can use the standard normal distribution and a test statistic of $$z = \frac{\overline{x} - \mu}{\displaystyle{\frac{\sigma}{\sqrt{n}}}}$$

If $\sigma$ is unknown and $n \ge 30$ or the distribution is approximately normal, then we can use a $t$-distribution with $n-1$ degrees of freedom and a test statistic of $$t = \frac{\overline{x} - \mu}{\displaystyle{\frac{s}{\sqrt{n}}}}$$

Recall when checking a data set to see if the underlying distribution is normal, one should remove any outliers (even when $n \ge 30$), check for skewness, and visually inspect the histogram to ensure it looks approximately normal.