Simulations with R and Excel

A Game With Unusual Dice

You and your friend are bored and decide to play a game of dice. The game of craps comes to mind, but is quickly discarded since you are both familiar with the probabilities involved. Then your friend pulls out a set of three colored dice (one red, one yellow, and one green) from his pocket. These dice are unusual in that they are not numbered in the normal manner. Instead, the numbers on their six sides agree with the below table:

$$\begin{array}{|c|c|c|c|c|c|}\hline \textrm{Red} & 3 & 3 & 3 & 3 & 3 & 6\\\hline \textrm{Yellow} & 2 & 2 & 2 & 5 & 5 & 5\\\hline \textrm{Green} & 1 & 4 & 4 & 4 & 4 & 4\\\hline \end{array}$$

Your friend tells you that they came from an old board game of his -- and he doesn't recall why they are numbered this way. He also doesn't remember the rules of the original game, but suggests the following rules instead. Basically, you each pick a die to roll, and then you roll it, with the larger number winning that roll.

You are suspicious of the apparent simplicity of your friend's game, and wonder if one die rolls higher numbers than the others on average. Your friend senses your distrust and assures you that no die is any better than any other. Doing a quick calculation in your head, you acknowledge that the average of the numbers on each die is 3.5. To calm any lingering fears you have about the matter, your friend offers to let you always choose first the die with which you wish to play, and he will pick from the remaining two dice.

Let $c_i$ be one of the colors: red, yellow, or green. Let $P(c_1,c_2)$ be the probability of your winning the game if you roll the die with color $c_1$ and your friend rolls the die with color $c_2$. For example, $P(red,green)$ is the probability of your winning the game if you roll red and your friend rolls green.

  1. Write three R functions red(k), yellow(k), and green(k) that simulate the sum of rolling $k$ dice of the associated color.

  2. Write an R function approximated.probability.that.c1.wins(n,k,c1,c2) that simulates $n$ rolls of $k$ dice of one color $c_1$ versus $k$ dice of color $c_2$. When this function is invoked, the arguments c1 and c2 are intended to be two of the functions red, yellow, and green described previously.

  3. Use the function just designed to approximate $P(c_1,c_2)$ for each possible pair of different colors, $c_1$ & $c_2$.

  4. According to your simulations, Which is the "best die" for your friend to choose if you choose to roll red, green, or yellow, respectively?

  5. Does there appear to be a "best die" for you to choose to roll first? Calculate the actual probabilities that were only approximated by your simulations to confirm your answer.

  6. Is this a fair game? Explain.

  7. Suppose instead that you each rolled two dice of the same color, with the larger total winning the roll. Assuming that you still pick your color first, and your friend chooses his color from the remaining two -- is this new game fair? Backup your conclusion with a similar set of simulations in R, and actual calculations of the probabilities involved. In the case that the game is not fair, how should the player with the advantage choose which die to roll?

Simulating Cancer Occurrences

A close relative of yours is fighting prostrate cancer. You believe their cancer may be connected to their living close to a paper and pulp mill. Working for Athena Health, a company that specializes in digital medical records, you have access to the medical histories of millions of people. Violating both company policy and probably some federal laws, you decide to examine some of these files to test your theory. For each of the 34 nearest pulp and paper mills in Georgia and the surrounding states, you download the medical histories of 14 randomly selected and currently deceased people that lived all of their lives within a 5 mile radius of the mill. You then find the proportion for each sample that were ever diagnosed with prostrate cancer. You have heard that nationally, the probability of a randomly selected person developing prostrate cancer in their lifetime is 13.97%, and believe your suspicions will be validated if you see any of your samples suggest the proportion is significantly higher than this -- say more than 26%.

  1. Use Excel to simulate the proportions of these 34 samples under an assumption that there is actually no connection between living close to a paper and pulp mill and developing prostrate cancer.

  2. Had the results of your simulations been the real data examined, what would the conclusions drawn from that data have been? Is this surprising?

  3. Explain how a simulation like the one you conducted could suggest that there was a connection between paper and pulp mills and developing cancer, when you made the explicit assumption that there was no connection between these two things.

  4. Decide on a change that you believe will reduce the chance of this type of "false positive" result, and then on a second Excel sheet, enact that change. Did your change work as expected? Explain.