R Functions Related to Simulation

Often in statistics we consider drawing elements from some larger population. R provides some powerful tools to simulate this circumstance. Two very important functions to this end are sample() and replicate.


sample()

Description
sample takes a sample of the specified size from the elements of x, either with or without replacement.

Usage
sample(x, size = n, replace = FALSE, prob = NULL)

where

Examples

Suppose a bag is filled with 3 red marbles and 7 blue marbles. To simulate a drawing of 4 marbles, without replacement, from the bag, we could do the following:

> sample(c(rep("red",3), rep("blue",7)), size=4, replace=FALSE)
[1] "red"  "blue" "blue" "red" 

To simulate the sum of two rolled dice, we could do the following:

> sum(sample(1:6, size=2, replace=TRUE))
[1] 7

To simulate 10 coin flips, we could do the following:

> sample(c("H","T"), 10, replace = TRUE)
 [1] "T" "T" "T" "H" "T" "H" "T" "T" "H" "H"

To simulate a random permutation of the letters ABCDE, we can make the sample size equal to the size of the vector we are sampling and sample without replacement:

> sample(c("A","B","C","D","E"), size = 5, replace = FALSE)
[1] "A" "E" "C" "B" "D"

Suppose the probability of a boy being born is $0.513$, while the probability of a girl is $0.487$. We could simulate 10 births with

> births = sample(c("boy","girl"), 10, replace=TRUE, prob=c(0.513,0.487))
> births
 [1] "girl" "boy"  "girl" "girl" "girl" "girl" "boy"  "boy"  "boy"  "girl"
Now suppose we want to see what happens in $100,000$ births. Showing the resulting vector will not be helpful, but a creative use of the == operator and the sum() function can be, as the following demonstrates (recall that TRUE when considered as a numerical value is equal to $1$, while FALSE is equal to $0$).
> births
 [1] "girl" "boy"  "girl" "girl" "girl" "girl" "boy"  "boy"  "boy"  "girl"
 
> births == "boy"
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE
 
> sum(births == "boy")
[1] 4
applying this strategy to a sample of size $100,000$ we have
> manybirths = sample(c("boy","girl"), 100000, replace=TRUE, prob=c(0.513,0.487))

> sum(manybirths == "boy")
[1] 51205

> sum(manybirths == "girl")
[1] 48795
Alternatively, we could appeal to the table() function, which simplifies the process:
> manybirths = sample(c("boy","girl"), 100000, replace=TRUE, prob=c(0.513,0.487))

> table(manybirths)
manybirths
  boy  girl 
51205 48795 

replicate()

Description
replicate is a function that allows us to repeatedly evaluate an expression (which usually involves something being done "randomly", like the selection of a sample()).

Usage
replicate(n, expr)

where

Examples

To simulate 20 times the sum of two rolled dice, we could do the following:

> replicate(n=20, sum(sample(1:6, size=2, replace=TRUE)))
 [1]  8  9  6  9  8  7 11  5  7  5 11  7  7 12  8 12  6 10  6 10