R Project: The Central Limit Theorem

The Central Limit Theorem tells us that the distribution of sample means $\overline{x}$, of samples of size $n$ taken from any given population

  1. becomes more "normal" in shape as $n$ increases;
  2. has a mean that agrees with the population mean, $\mu$; and
  3. has a standard deviation equal to $\sigma/\sqrt{n}$, where $\sigma$ is the standard deviation of the population.

In this project, we will construct a population, and then approximate the distributions of sample means for various sample sizes through repeated sampling, so that we can "see" this theorem in action through a sequence of histograms -- as suggested by the below graphic

First, we need a population with which to work. Ideally, it will be "far from normal" so that we can see the transition from a non-normal distribution when $n$ is small to a much more normal one when $n$ is large.

For convenience, one can use the following code to construct a population of 10000 values from 1 to 100 that follows a non-normal distribution.

rprobs = sample(1:5,5)
rprobs = rep(16*rprobs,each=20)
rprobs = rprobs + 20*runif(100)
rprobs = rprobs / sum(rprobs)
pop = sample(1:100,size=10000,replace=TRUE,prob=rprobs)

Next, write a function population.hist(pop) that displays a histogram of the population represented by the vector pop that consists of some number of values, each between 1 and 100, inclusive.

Additionally:

Then, create a second function sample.means(pop,sample.size,n,title,show.overlay) that draws n samples of size sample.size from the population pop, computes their means, and displays a histogram of these means.

Additionally:

The following observations may help in doing the above: