![]() | ![]() |
To find the hypergeometric probability of seeing exactly x white balls when drawing k balls from an urn containing m white balls and n black balls, or equivalently P(x)=mCx⋅nCk−xm+nCk
R: use the function
dhyper(x, m, n, k)
As an example, note that usually 50 potential jurors are held to compose a jury of 12. Suppose that this group of 50 has 15 females and 35 males. To find the probability that the jury will be made up of 4 females and 8 males, one could use the following:
> dhyper(4, 15, 35, 12) [1] 0.2646333
Excel: use the function
HYPGEOM.DIST(x, k, m, N, FALSE)
As an important difference from the corresponding R function above, note that here, N represents the total number of balls, (i.e., N=n+m).
The last argument for this function, when FALSE, indicates that the probability returned should not be cumulative (i.e., it only returns P(k), not P(0)+P(1)+⋯+P(k)).
Suppose one wishes to find the cumulative hypergeometric probability of seeing x or fewer white balls when drawing k balls from an urn containing m white balls and n black balls, or equivalently P(X<=x)=P(0)+P(1)+P(2)+⋯+P(x)=∑0≤i≤xmCi⋅nCk−im+nCk
R: use the function
phyper(x, m, n, k)
As an example, in a New York State Lotto game, a bettor selects 6 numbers from 1 to 59 (without repetition), and a winning 6-number combination is later randomly selected. To find the probability that one purchases a 1 ticket with a 6-number combination and gets more than 2 of the winning numbers, one could use the following:
> 1 - phyper(2, 6, 53, 6) [1] 0.0108641
Excel: use the function
HYPGEOM.DIST(x,k,m,N,TRUE)Here again, importantly, The value of N used in this function represents the total number of balls, which differs from the n used in its R-based counterpart discussed above.
The last argument for this function, when TRUE, indicates the probability returned should be cumulative. That is to say, it gives the sum P(0)+P(1)+⋯+P(k).
To simulate numbers randomly chosen from a hypergeometric distribution, such as the count of white balls seen when drawing k balls without replacement from an urn containing m white balls and n black balls ...
R: use the function
rhyper(nn, m, n, k)
Note, the value nn above indicates how many numbers to generate.
As an example, suppose 20% of a batch of 30 integrated circuit chips are defective. To simulate the number of defective chips found in 10 random samples of size 8, one could use the following:
> rhyper(10, 6, 24, 8) [1] 2 1 1 0 1 2 1 2 0 4
Excel: There is no built-in hypergeometric analog to BINOM.INV(), so random numbers following a hypergeometric-distribution can't be generated in the same way.