The Hypergeometric Distribution

The context of a hypergeometric distribution is similar to the binomial distribution in that you are interested in only two outcomes, but the independence prerequisite for a binomial experiment is not satisfied.

In particular, a hypergeometric distribution involves a typically small population with only two types of objects in it (e.g., males and females, red marbles and black marbles, people that support President Trump and people that don't, etc...).

One then draws without replacement a random sample of size $n$ which represents a relatively large portion (i.e., more than 5%) of this population.

Note, this is where the independence prerequisite for a binomial experiment is not satisfied. We are drawing without replacement, so the probabilities of drawing a desired type of object (a "success") changes as more objects are removed from the population. Further, given that the sample is not small relative to the size of the population, we are assured that the changes in these probabilities are not negligable.

The probability of seeing exactly $x$ objects of one desired type is then given by

$$P(X=x) = \frac{{}_aC_x \cdot {}_bC_{n-x}}{{}_{a+b}C_{n}}$$

Assuming there were $a$ objects of the desired type and $b$ objects of the undesired type in the population.

To see this, note first that there are ${}_{a+b}C_{n}$ total ways to choose a sample of size $n$ from a group of $a+b$ objects, so this is our denominator.

Then, for the numerator, we must count how many of those samples contain exactly $X$ objects of one type (where there are $a$ objects of this type in the population and $b$ objects of the other type).

To build a representative such sample, first pick $x$ of the $a$ objects of the desired type -- which can be done in ${}_{a}C_{X}$ways.

Then, make sure the $(n-x)$ objects in the rest of the sample are chosen from the $b$ objects of the other type -- which can be done in ${}_{b}C_{n-x}$ ways.

By the fundamental counting principle, the total number of samples thus produced (and hence, the numerator for our probability) is given by the product of these two combinations.

$${}_aC_x \cdot {}_bC_{n-x}$$

This completes our argument.