To generate a vector of $n$ random values following a uniform distribution, one can use the runif()
function.
runif
Usage
runif(n)
runif(n, min, max)
Example
To generate 10 random values between $0$ and $1$, and then 5 random values between $2$ and $7$, one could use the following:
> runif(10) [1] 0.7072679 0.2488529 0.7572154 0.3351405 0.2017931 0.5901582 [7] 0.4627859 0.8125028 0.3704643 0.7976154 > runif(5,min=2,max=7) [1] 2.308490 3.929546 2.929231 2.294929 4.137228
Often in statistics we consider drawing elements from some larger population. R provides a powerful tool
in the form of the sample()
function to this end.
sample()
Description
sample takes a sample of the specified size from the elements of x, either with or without replacement.
Usage
sample(x, size = n, replace = FALSE, prob = NULL)
where
Examples
Suppose a bag is filled with 3 red marbles and 7 blue marbles. To simulate a drawing of 4 marbles, without replacement, from the bag, we could do the following:
> sample(c(rep("red",3), rep("blue",7)), size=4, replace=FALSE) [1] "red" "blue" "blue" "red"
To simulate the sum of two rolled dice, we could do the following:
> sum(sample(1:6, size=2, replace=TRUE)) [1] 7
To simulate 10 coin flips, we could do the following:
> sample(c("H","T"), 10, replace = TRUE) [1] "T" "T" "T" "H" "T" "H" "T" "T" "H" "H"
To simulate a random permutation of the letters ABCDE, we can make the sample size equal to the size of the vector we are sampling and sample without replacement:
> sample(c("A","B","C","D","E"), size = 5, replace = FALSE) [1] "A" "E" "C" "B" "D"
Suppose the probability of a boy being born is $0.513$, while the probability of a girl is $0.487$. We could simulate 10 births with
> births = sample(c("boy","girl"), 10, replace=TRUE, prob=c(0.513,0.487)) > births [1] "girl" "boy" "girl" "girl" "girl" "girl" "boy" "boy" "boy" "girl"Now suppose we want to see what happens in $100,000$ births. Showing the resulting vector will not be helpful, but a creative use of the
==
operator and the sum()
function can be, as the following demonstrates (recall that TRUE
when considered as a numerical value is equal to $1$, while FALSE
is equal to $0$).
> births [1] "girl" "boy" "girl" "girl" "girl" "girl" "boy" "boy" "boy" "girl" > births == "boy" [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE > sum(births == "boy") [1] 4applying this strategy to a sample of size $100,000$ we have
> manybirths = sample(c("boy","girl"), 100000, replace=TRUE, prob=c(0.513,0.487)) > sum(manybirths == "boy") [1] 51205 > sum(manybirths == "girl") [1] 48795Alternatively, we could appeal to the
table()
function, which simplifies the process:
> manybirths = sample(c("boy","girl"), 100000, replace=TRUE, prob=c(0.513,0.487)) > table(manybirths) manybirths boy girl 51205 48795
Other times, one needs to simulate multiple occurrences of the same random phenomenon -- perhaps one built around runif()
, or sample()
, or one of the other random distribution functions that we will learn about later. In these cases, the replicate()
function will likely be what one needs.
replicate()
Description
replicate
is a function that allows us to repeatedly evaluate an expression (which usually involves something being done "randomly", like the selection of a sample()
).
Usage
replicate(n, expr)
where
Examples
To simulate 20 times the sum of two rolled dice, we could do the following:
> replicate(n=20, sum(sample(1:6, size=2, replace=TRUE))) [1] 8 9 6 9 8 7 11 5 7 5 11 7 7 12 8 12 6 10 6 10