A close relative of yours is fighting prostrate cancer. You believe their cancer may be connected to their living close to a paper and pulp mill. Working for Athena Health, a company that specializes in digital medical records, you have access to the medical histories of millions of people. Violating both company policy and probably some federal laws, you decide to examine some of these files to test your theory. For each of the 34 nearest pulp and paper mills in Georgia and the surrounding states, you download the medical histories of 14 randomly selected and currently deceased people that lived all of their lives within a 5 mile radius of the mill. You then find the proportion for each sample that were ever diagnosed with prostrate cancer. You have heard that nationally, the probability of a randomly selected person developing prostrate cancer in their lifetime is 13.97%, and believe your suspicions will be validated if you see any of your samples suggest the proportion is significantly higher than this -- say more than 26%.
Use Excel to simulate the proportions of these 34 samples under an assumption that there is actually no connection between living close to a paper and pulp mill and developing prostrate cancer.
Had the results of your simulations been the real data examined, what would the conclusions drawn from that data have been? Is this surprising?
Explain how a simulation like the one you conducted could suggest that there was a connection between paper and pulp mills and developing cancer, when you made the explicit assumption that there was no connection between these two things.
Decide on a change that you believe will reduce the chance of this type of "false positive" result, and then on a second Excel sheet, enact that change. Did your change work as expected? Explain.