- The sum of values shown when two dice are rolled
- The fraction of area covered by crab grass in a randomly selected lawn
- The number of heads seen when flipping a coin 3 times
- The number of spades seen in 5 cards drawn from a standard deck of playing cards
- The length of a randomly selected fish in a lake
- The net profit (or loss) resulting from an investment playing the lottery
- The income associated with issuing an insurance policy
- The number of microwaves sold each day at a local appliance store

As an example -- and hopefully it won't spoil what is to come -- but sometimes in statistics we will want to talk about the average value (i.e., the *mean*) of something investigated. Finding averages should not be new to anyone -- they consist of adding up things and then dividing by how many things one has. However, how does one average different prizes won at the fair, or types of fish?

Being more accurate in our wording, random variables are methods for turning (possibly non-numerical) outcomes of a random experiment into numbers. Indeed, we define a **random variable** to be a *function* $X$, that assigns to each outcome $x$ in the sample space $S$ one and only one number.

Let us consider an example:

We've seen before the sample space for rolling two fair dice:

$$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & (1,1) & (1,2) & (1,3) & (1,4) & (1,5) & (1,6)\\\hline 2 & (2,1) & (2,2) & (2,3) & (2,4) & (2,5) & (2,6)\\\hline 3 & (3,1) & (3,2) & (3,3) & (3,4) & (3,5) & (3,6)\\\hline 4 & (4,1) & (4,2) & (4,3) & (4,4) & (4,5) & (4,6)\\\hline 5 & (5,1) & (5,2) & (5,3) & (5,4) & (5,5) & (5,6)\\\hline 6 & (6,1) & (6,2) & (6,3) & (6,4) & (6,5) & (6,6)\\\hline \end{array}$$ Note that each element in the sample space is an ordered pair -- not a number.However, one can turn each ordered pair into a number by summing the two coordinates. The random variable in this case is then the function $X$ that does the summing: i.e., $X(i,j) = i+j$.

In this way, $X$ is associated with a new sample space, call it $\mathscr{D} = \{2,3,4,\ldots,12\}$, as under $X$ the above table turns into: $$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline 2 & 3 & 4 & 5 & 6 & 7 & 8\\\hline 3 & 4 & 5 & 6 & 7 & 8 & 9\\\hline 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline 6 & 7 & 8 & 9 & 10 & 11 & 12\\\hline \end{array}$$ ...and a new probability set function, $P_X$ (where, for example $P_X(7 \textrm{ or } 11) = 8/36)$.

It is important to note that summing the two coordinates is not the only way to create a number in this context. Another very flexible way to do this is to think of the number associated with each roll as a "net pay-out" (i.e, profit minus cost) for that roll when playing some game at a carnival. As an example, suppose you were rolling the two dice in the context of a game that costs $\$7$ dollars to play and awards $\$100$ for a roll of "box cars" (i.e., two 6's), $\$10$ for each 5 rolled, and nothing for the rest. In this case, the function $X$ is defined by

$$X(i,j) = \left\{ \begin{array}{ll} 93 & \textrm{ if } i = 6 \textrm{ and } j = 6\\ 3 & \textrm{ if } i = 5 \textrm{ or } j = 5, \textrm{ but not both}\\ 13 & \textrm{ if both } i = 5 \textrm{ and } j = 5\\ -7 & \textrm{ otherwise } \end{array} \right.$$ with a new sample space of $\{93,3,13,-7\}$, as suggested by the corresponding table: $$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 2 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 3 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 4 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 5 & 3 & 3 & 3 & 3 & 13 & 3\\\hline 6 & -7 & -7 & -7 & -7 & 3 & 93\\\hline \end{array}$$ ..and a new probability set function $P_X$ (where, for example $P_X(3) = 10/36$)Note, that by adjusting the pay-out values, we can assign pretty much whatever numbers we wish -- which is consistent with the lack of any restrictions on the function in the definition of a random variable other than it assign some number to every outcome in the sample space.

In the examples above, the new sample spaces that resulted all had a finite number of outcomes (i.e., 36). In other contexts (i.e., those not involving the rolling of two dice) one might have more or less. Indeed, one can even consider scenarios where there are an infinite number of outcomes -- consider the number of times one must flip a coin before seeing a head. It's incredibly unlikely for this number to correspond to more than a handful of flips -- but we can't say with certainty that it will take less than any given number of tosses of the coin. In both of these cases, however, we say that the set of outcomes is **countable**, which is just a fancy way of saying we could put all of the outcomes in a (possibly infinite) list.

There are uncountable sets too. The set of real numbers is one that will be important to us later. (*Wait -- did we just suggest the real numbers can't be put into a list? Yes we did! Further, you can prove this amazing fact -- look up "Cantor's Diagonal Argument" if you are curious.*)

Just as we were able to calculate the probabilities of rolling different sums with two dice (recall our earlier conclusion that $P_X(7 \textrm{ or } 11) = 8/36$), we seek to calculate the probabilities associated with different values corresponding to whatever random variable $X$ we might need to investigate. But how do we do that? As it turns out, we need to do different things depending on whether the sample space associated with $X$ is countable or uncountable.

As such, we classify random variables based on this countability (or uncountability). When the sample space is countable, we say the random variable is a **discrete random variable**. When the sample space is uncountable, we say the random variable is a **continuous random variable**.

Let's consider the case of finding $P_X$ first when $X$ is a discrete variable.

Recall the new sample space $\mathscr{D}$ that resulted from considering the random variable $X$ equal to the sum of the values showing on two rolled dice: $$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline 2 & 3 & 4 & 5 & 6 & 7 & 8\\\hline 3 & 4 & 5 & 6 & 7 & 8 & 9\\\hline 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline 6 & 7 & 8 & 9 & 10 & 11 & 12\\\hline \end{array}$$

To find $P_X(7 \textrm{ or } 11)$, it is helpful to partition the sample space into a mutually exclusive and exhaustive collection of sets by value, and then consider the probabilities that each such set occurs (which correspond to outputs of the related probability set function $P$). That is to say, we make a table like the one below:

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|}\hline \textrm{Value} & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12\\\hline \textrm{Probability} & \frac{1}{36} & \frac{2}{36} & \frac{3}{36} & \frac{4}{36} & \frac{5}{36} & \frac{6}{36} & \frac{5}{36} & \frac{4}{36} & \frac{3}{36} & \frac{2}{36} & \frac{1}{36}\\\hline \end{array}$$To make talking about the probabilities associated with different values easier, we view this table as a new function, denoted $p_X$, whose inputs are the values and whose outputs are the probabilities those outputs occur.

This leads to an important definition. Let our original sample space be $S$ and our new sample space for $X$ be $\mathscr{D} = \{d_1,d_2,d_3,\ldots\}$. Further, suppose $P$ is the probability set function corresponding to the original random experiment and whose domain consists of subsets of $S$. Now, let event $C$ be the subset of all $c$ in $S$ where $X(c) = d_i$. Finally, we define the **probability mass function, $p_X$** with domain $\mathscr{D}$ so that $p_X(d_i) = P(C)$.

One should be aware, given how $p_X$ is defined in terms of a probability set function $P$, not just any table can represent a probability mass function. Specifically, for the new sample space $\mathscr{D}$ in question, the following two properties must hold: $$ 0 \le p_X(d_i) \le 1 \textrm{ for all } d_i \in \mathscr{D} \quad \quad \textrm{and} \quad \quad \sum_{d_i \in \mathscr{D}} p_X(d_i) = 1$$

Again turning our attention to how we previously calculated $P_X(7 \textrm{ or } 11)$, note that the set of rolls corresponding to a $7$ and the set of rolls corresponding to a $11$ are mutually exclusive. Consequently, we can find $P_X(7 \textrm{ or } 11)$ by finding a sum: $$P_X(7 \textrm{ or } 11) = p_X(7) + p_X(11) = \frac{6}{36} + \frac{2}{36} = \frac{8}{36}$$

More generally, we can find the probability of any event $D$ in our new sample space $\mathscr{D} = \{d_1,d_2,d_3,\ldots\}$ by summing similar probabilities coming from the probability mass function: $$P_X(D) = \sum_{d_i \in D} p_X(d_i)$$

Finally, a few remarks about how one can simplify the notation used above are in order:

When the context makes clear the random variable $X$ in question, we frequently drop it from the notation. So $P_X(E)$ becomes simply $P(E)$.

For clarity, if $x$ is a value in the new sample space associated with random variable $X$, we use the expression $P(X=x)$ to mean $p_X(x)$.

Again, when the context also makes $X$ clear -- we may further abbreviate $P(X=x)$ to simply $P(x)$.

- In a similar manner, with respect to some random variable $X$, we may abbreviate an application of its probability set function to an event $E$ in the new sample space that can be described by $E_{described}$ with simply $P(E_{described})$
As an example, suppose we roll two dice and $X$ is the total of the values shown on the dice. To denote the probability of the event of seeing an even total, we write $$P(\textrm{even total}) \quad \quad \textrm{ instead of } \quad \quad P(\{2,4,6,8,10,12\})$$

For completeness, note $$\begin{array}{rcl} P(\textrm{even total}) &=& P(2) + P(4) + P(6) + P(8) + P(10) + P(12)\\ &=& \frac{1}{36} + \frac{3}{36} + \frac{5}{36} + \frac{5}{36} + \frac{3}{36} + \frac{1}{36}\\ &=& \frac{18}{36}\\ &=& \frac{1}{2} \end{array}$$

As a second example of using the description of the event to abbreviate the corresponding probability, consider the same scenario of rolling two dice and examining their sum. The probability of rolling a sum less than 5 can be written and found as follows:

$$\begin{array}{rcl} P(X < 5) &=& P(2) + P(3) + P(4)\\ &=& \frac{1}{36} + \frac{2}{36} + \frac{3}{36}\\ &=& \frac{6}{36}\\ &=& \frac{1}{6} \end{array}$$

Let's try to solidify all of these ideas (notational and otherwise) with one final example:

Suppose we are interested in the random experiment where one flips a coin three times.

The sample space for this random experiment consists of 8 equally likely possibilities:

HHH | HHT | HTH | HTT | THH | THT | TTH | TTT |

Now let the random variable $X$ be the number of heads seen.

The new sample space is then

3 | 2 | 2 | 1 | 2 | 1 | 1 | 0 |

The probability mass function is given by $$\begin{array}{|c|c|c|c|c|}\hline x & 0 & 1 & 2 & 3 \\\hline P(x) & \frac{1}{8} & \frac{3}{8} & \frac{3}{8} & \frac{1}{8}\\\hline \end{array}$$

We can do a partial check on our calculations by recognizing we must end up with a legitimate probability mass function, where the following two properties are satisfied, relative to its domain $\mathscr{D}$: $$0 \le P(x) \le 1 \textrm{ for every } x \in \mathscr{D} \quad \textrm{ and } \quad \sum_{x \in \mathscr{D}} P(x) = 1$$ That is to say, the probability of any particular number of heads can't be negative or greater than 100%, and the sum of the probabilities should equal one (as 100% of the time, there is an outcome).

We are now armed to calculate probability questions like "What's the probability that one sees more than one head in 3 flips of a coin?"

Answer: $$P(X > 1) = P(2) + P(3) = \frac{3}{8} + \frac{1}{8} = \frac{1}{2}$$

One last observation -- instead of using a table, one can also specify the probability mass function associated with a random variable using a formula -- as in the case of the binomial distribution, which gives the probability of observing $x$ successes in $n$ Bernoulli trials (more on what these words mean later..):

$$P(x) = {}_nC_x p^x q^{n-x} \quad \textrm{where } x = 0,1,2,\ldots, n$$The fact that all of the probabilities so produced are between zero and one, with a sum of exactly one is less obvious here -- but still present. Indeed, if we consider the case when $n=3$ and $p=1/2$, we produce precisely the same probability mass function as that just seen when counting numbers of heads seen in 3 flips of a coin!