Exercises - Central Limit Theorem

  1. What does the notation $\mu_{\overline{x}}$ and $\sigma_{\overline{x}}$ represent?

    $\mu_{\overline{x}}$ is the mean of the population of all sample means (for samples of some given size $n$), while $\sigma_{\overline{x}}$ is the standard deviation for this same population.

  2. A sample is chosen randomly from a population that can be described by a Normal model.

    1. Describe the shape, center, and spread of the distribution of sample means for some given sample size $n$.

    2. If we increase the sample size, what's the effect on the distribution of sample means?

    1. The distribution of sample means follows a normal distribution, with a mean identical to that of the original distribution, and with a standard deviation equal to the standard deviation of the original distribution divided by $\sqrt{n}$.
    2. The standard deviation of the distribution of sample means decreases (i.e., the distribution becomes more narrow.)

  3. Compare the probability distribution for rolling a single 6-sided die to the probability distribution for the mean of two 6-sided dice (draw the histograms).

    The distribution for rolling a single 6-sided die is uniform, while the distribution for the mean of two 6-sided dice is unimodal (notably more normal than the uniform distribution) with mean 7, and a smaller standard deviation.

  4. A survey found that the American family generates an average of 17.2 pounds of glass garbage each year. Assume the standard deviation of the distribution is 2.5 pounds.

    1. Find the probability that the mean of a sample of 55 families will be between 17 and 18 pounds.

    2. Why can the central limit theorem be applied?

    1. For the distribution of sample means, $\mu = 17.2$, while $\sigma = 2.5/\sqrt{55} = 0.3371$. We want $P(17 \lt x \lt 18)$, so we find $z_{17} = (17-17.2)/0.3371 = -0.5933$ and $z_{18} = (18-17.2)/0.3371 = 0.2373$ and the related probability $P(-0.5933 \lt z 0.2373) = 0.3173$ is our answer.

    2. We are considering a distribution of sample means, so the Central Limit Theorem applies. (Also, as $55 \gt 30$, we can approximate this distribution of sample means as a normal distribution.)

  5. The average teacher's salary in New Jersey is $\$52,174$. Suppose that the distribution is normal with standard deviation $\$7500$.

    1. What is the probability that a randomly selected teacher makes less than $\$50,000$ per year?

    2. If we sample 100 teachers' salaries, what is the probability that the sample mean is less than $\$50,000$ per year?

    3. Why is the probability in part (a) higher than the probability in part (b)?

    1. $\mu = 52174$ and $\sigma = 7500$. Finding $z_{50,000} = (50000 - 52174)/7500 = -0.2899$, we seek $P(x \lt 50000) = P(z \lt -0.2899) = 0.3860$

    2. In the distribution of sample means of size $100$, we have $\mu = 52174$, while $\sigma = 7500/\sqrt{100} = 750$. So, we find $z_{50,000} = (50000 - 52174)/750 = -2.8987$, and calculate $P(\overline{x} \lt 50000)$ as $P(z \lt -2.8987) = 0.0019$.

    3. The Central Limit Theorem suggests that the distribution of sample means is narrower than the distribution for the population -- leaving less area (and hence probability) in the tails.

  6. Assume SAT scores are normally distributed with mean 1518 and standard deviation 325.

    1. If one SAT score is randomly selected, find the probability that it is between 1440 and 1480.

    2. If 16 SAT scores are randomly selected, find the probability that they have a mean between 1440 and 1480.

    3. Why can the central limit theorem be used in part (b) even though the sample size does not exceed 30?

    1. $\mu = 1518$ and $\sigma = 325$. Finding $z_{1440} = (1440-1518)/325 = -0.2400$ and $z_{1480} = (1480-1518)/325 = -0.1169$, we calculate $P(1440 \lt x \lt 1480)$ as $P(-0.2400 \lt z \lt -0.1169) = 0.0483$.

    2. In the distribution of sample means of size $16$, we have $\mu = 1518$, while $\sigma = 325/\sqrt{16} = 81.25$. Finding $z_{1440} = (1440-1518)/81.25 = -0.96$ and $z_{1480} = (1480 - 1518)/81.25 = -0.4677$, we calculate $P(1440 \lt \overline{x} \lt 1480)$ as $P(-0.96 \lt z \lt -0.4677) = 0.1515$.

    3. The Central Limit Theorem tells us that the distributions of the sample means tend towards a normal distribution as the sample size increases. In this case, the original population distribution was already normally distributed, so all of the distributions of sample means must already be normal.

  7. The lengths of pregnancies are normally distributed with a mean of 268 days and a standard deviation of 15 days.

    1. If one pregnant woman is randomly selected, find the probability that her length of pregnancy is less than 260 days.

    2. If 25 pregnant women are put on a special diet just before they become pregnant, find the probability that their lengths of pregnancy have a mean that is less than 260 days (assuming that the diet has no effect).

    3. If the 25 women do have a mean of less than 260 days, does it appear that the diet has an effect on the length of pregnancy, and should the medical supervisors be concerned?

    1. $\mu = 268$ and $\sigma = 15$. Finding $z_{260} = (260-268)/15 = -0.5333$, we calculate $P(x \lt 260)$ as $P(z \lt -0.5333) = 0.2969$.

    2. In the distribution of sample means of size $25$, we have $\mu = 260$, while $\sigma = 15/\sqrt{25} = 3$. Finding $z_{260} = (260 - 268)/3 = -2.6666$, we calculate $P(x \lt 260)$ as $P(z \lt -2.6666) = 0.0038$.

    3. Seeing a sample like this (i.e., with a mean of less than 260 days) is clearly a rare event ($0.0038$ is less than one percent). So if the one and only sample we found had this mean pregnancy length, it casts doubt as to whether or not the mean for these women is still $268$ days (much like seeing the incredibly rare event of 99 out of a 100 coin flips resulting in heads casts doubt on your belief that the coin flipped is fair). The only thing that separates these women from the general population is their special diet -- so yes, it appears the diet had an effect on the length of their pregnancy. Medical supervisors should be concerned.

  8. Assume that a test has a mean score of 75 and a standard deviation of 10. Assume the distribution of scores is approximately normal.

    1. What is the probability that a person chosen at random will make 100 or above on the test?

    2. What score should be used to identify the top 2.5%?

    3. In a group of 100 people, how many would you expect to score below 60?

    4. What is the probability that the mean of a group of 100 will score below 70?

    1. $\mu = 75$ and $\sigma = 10$. Finding $z_{100} = (100-75)/10 = 2.5$, we calculate $P(x \gt 100)$ as $P(z \gt 2.5) = 0.0062$.

    2. Note that the top $2.5\%$ corresponds to $0.025$ in area right of some $z$-score. But then the area left of this $z$-score is $1-0.025 = 0.975$. Using a table or technology, we find this corresponds to $z = 1.960$. Recalling that a $z$ score is a number of standard deviations away from the mean (with positive $z$-scores associated with being to the right of the mean and negative ones being to the left of the mean), the cut-off test score we seek is $\mu + z\sigma = 75 + (1.960)(10) = 94.6$

    3. Note, this problem does NOT ask about an average score of the 100 people -- so we are NOT looking at the distribution of sample means. Instead, we simply find the probability that a score is below $60$ and then multiply by $100$. Note $\mu = 75$ and $\sigma = 10$, so finding $z_{60} = (60-75)/10 = -1.5$, we calculate $P(x \lt 60)$ as $P(z \lt -1.5) = 0.0668$. Finally, multiplying by $100$ we get the expected number in a group of $100$ people to do this poorly -- namely, about 7 people.

    4. This problem IS asking about the mean of a group of $100$, so we ARE talking about the distribution of sample means. Thus, for the distribution of sample means, $\mu = 75$, while $\sigma = 10/\sqrt{100} = 1$. Finding $z_{70} = (70 - 75)/1 = -5$, we calculate $P(x \lt 70)$ as $P(z \lt -5) \approx 2.8 \times 10^{-7}$ which is very, very small!

  9. Carbon monoxide (CO) emissions for a certain kind of car vary with mean 2.9 gm/mi and standard deviation 0.4 gm/mi. A company has 80 of these cars in its fleet, acquired from various (i.e., random) sources.

    1. What's the probability that a randomly selected car from the fleet has CO emissions in excess of 3.1 gm/mi?

    2. What's the probability that the average CO emissions for all 80 cars is in excess of 3.1?

    3. There is only a 1% chance that the fleet's mean CO level is greater than what value?

    approximately
    1. If the emissions are normally distributed, then $0.3085$ -- but we don't know how this population is distributed, so we can't say this for sure.
    2. $0.0000038$
    3. $3.0040$

  10. Although most of us buy milk by the quart or gallon, farmers measure daily production in pounds. Ayrshire cows average 47 poinds of milk a day, with a standard deviation of 6 pounds. For Jersey cows, the mean daily production is 43 pounds, with a standard deviation of 5 pounds. Assume that Normal models describe milk production for these breeds.

    1. If we select an Ayrshire at random, what's the probability that she averages more than 50 pounds of milk a day?

    2. A farmer has 20 Jerseys. What's the probability that the average production for this small herd exceeds 45 pounds of milk a day?

    3. A farmer has 20 Ayrshires. There's a $99\%$ chance each day that this small herd produces at least how many pounds of milk?

    1. $0.3085$
    2. $0.0368$
    3. $877.5$ pounds