# R Functions Related to Simulation

Often in statistics we consider drawing elements from some larger population. R provides some powerful tools to simulate this circumstance. Two very important functions to this end are sample() and replicate.

sample()

Description
sample takes a sample of the specified size from the elements of x, either with or without replacement.

Usage
sample(x, size = n, replace = FALSE, prob = NULL)

where

• $x$ is a vector of one or more elements from which to choose.
• $n$ is a positive number of items to choose.
• $replace$ (optional) indicates whether or not sampling should be done with replacement.
• $prob$ (optional) is a vector of probability weights for obtaining the elements of the vector being sampled.

Examples

Suppose a bag is filled with 3 red marbles and 7 blue marbles. To simulate a drawing of 4 marbles, without replacement, from the bag, we could do the following:

> sample(c(rep("red",3), rep("blue",7)), size=4, replace=FALSE)
[1] "red"  "blue" "blue" "red"


To simulate the sum of two rolled dice, we could do the following:

> sum(sample(1:6, size=2, replace=TRUE))
[1] 7


To simulate 10 coin flips, we could do the following:

> sample(c("H","T"), 10, replace = TRUE)
[1] "T" "T" "T" "H" "T" "H" "T" "T" "H" "H"


To simulate a random permutation of the letters ABCDE, we can make the sample size equal to the size of the vector we are sampling and sample without replacement:

> sample(c("A","B","C","D","E"), size = 5, replace = FALSE)
[1] "A" "E" "C" "B" "D"


Suppose the probability of a boy being born is $0.513$, while the probability of a girl is $0.487$. We could simulate 10 births with

> births = sample(c("boy","girl"), 10, replace=TRUE, prob=c(0.513,0.487))
> births
[1] "girl" "boy"  "girl" "girl" "girl" "girl" "boy"  "boy"  "boy"  "girl"

Now suppose we want to see what happens in $100,000$ births. Showing the resulting vector will not be helpful, but a creative use of the == operator and the sum() function can be, as the following demonstrates (recall that TRUE when considered as a numerical value is equal to $1$, while FALSE is equal to $0$).
> births
[1] "girl" "boy"  "girl" "girl" "girl" "girl" "boy"  "boy"  "boy"  "girl"

> births == "boy"
[1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

> sum(births == "boy")
[1] 4

applying this strategy to a sample of size $100,000$ we have
> manybirths = sample(c("boy","girl"), 100000, replace=TRUE, prob=c(0.513,0.487))

> sum(manybirths == "boy")
[1] 51205

> sum(manybirths == "girl")
[1] 48795

Alternatively, we could appeal to the table() function, which simplifies the process:
> manybirths = sample(c("boy","girl"), 100000, replace=TRUE, prob=c(0.513,0.487))

> table(manybirths)
manybirths
boy  girl
51205 48795


replicate()

Description
replicate is a function that allows us to repeatedly evaluate an expression (which usually involves something being done "randomly", like the selection of a sample()).

Usage
replicate(n, expr)

where

• $n$ is the number of times to evaluate the expression
• $expr$ is the expression to be evaluated

Examples

To simulate 20 times the sum of two rolled dice, we could do the following:

> replicate(n=20, sum(sample(1:6, size=2, replace=TRUE)))
[1]  8  9  6  9  8  7 11  5  7  5 11  7  7 12  8 12  6 10  6 10