The first step in experimental design is to know the difference between an experiment and an observational study.
In an observational study one measures or collects data, estimates population parameters, and makes observations and inferences, but at no time does the researcher interfere with subjects or variables in any way. Since observational studies seek to say something about the underlying populations, it is important that the samples involved be as representative of the population as possible.
Observational studies can be categorized when the observations are made:
In a retrospective (or case-control) study, historical or past data is examined.
In a prospective (or longitudinal or cohort) data is collected in the future from groups sharing common factors (like persons diagnosed with cancer). These groups are called cohorts.
In a cross-sectional study, data is observed, measured, and collected at one point in time.
In an experiment, one imposes various treatments on randomly assigned groups of objects in the interest of observing the response. These groups are called treatment groups, while the objects that make them up are sometimes called units or subjects.
Because the validity of a experiment is directly affected by its construction and execution, attention paid to the design of the experiment is extremely important.
A treatment is something that researchers administer to experimental units. For example: a corn field is divided into four, each part is 'treated' with different amounts of fertilizer and water to see which produces the most corn; a NASA engineer uses different types of altimeters to detect when a rocket is at a certain height above the earth; a doctor treats patients that have a certain type of injury with different amounts of medications to see which is most effective.
The various controlled independent variables that are set by the experimenter are called factors. In the previous examples, the factors were fertilizer, water, altimeter type, and medication.
Different treatments correspond to different levels of the related factor (or factors). For example, suppose in the last example subjects were given 5mg, 10mg, or 15mg of medication, depending on their assigned treatment group. These amounts would constitute the levels associated with the factor medication.
In the case of the example involving corn above, there would be different levels associated with the amount of fertilizer used (e.g., 250g, 500g, and 1000g per corn plant) and different levels associated with how often the corn plants are watered (e.g., twice a week, or daily). Considering the possible combinations of these results in 6 treatment groups:
Levels, of course, can be more categorical in nature too. In the aforementioned example where the factor involved an altimeter, the levels might be:
Taking the time and effort to organize the experiment properly to ensure that the right type of data, and enough of it, is available to answer the questions of interest as clearly and efficiently as possible is at the heart of experimental design.
The specific questions that the experiment is intended to answer must be clearly identified before carrying out the experiment. One should also attempt to identify any known or expected sources of variability in the experimental units, since one of the main aims of a designed experiment is to reduce the effect of these sources of variability on the answers to the questions of interest.
Once these known sources of variability have been identified, the typical strategy (as elaborated upon below) of an experimenter can be summarized as "control what you know, randomize the rest!"
Suppose an experimenter is interested in the amount of medication that produces the best results for patients with a given disease. He contacts two doctors, the first being a geriatric specialist, the second being a general physician. He instructs the first doctor to prescribe 50mg of medication for his patients that show signs of the disease, and the second doctor to prescribe 100mg for similar patients. He discovers the patients prescribed medicine by the second doctor responded much better than those prescribed medicine by the first.
The problem with this experiment is that the experimenter has neglected to control for the effect of the differences in age among his test subjects. It may be the case that the first doctor's patients, being much older, simply were not able to fight the disease as effectively due to their age. One simply can't tell given the design of this experiment whether medicine or age was the causal factor. When this happens, we say the medicine and the patients' ages were confounding factors. To the extent that age really does influence the results of the experiment, we have introduced experimental bias.
One important source of experimental bias that is most apparent in medical experiments is the placebo effect. Since many patients are confident that a treatment will positively affect them, they react to a control treatment which actually has no physical affect at all, such as a sugar pill. For this reason, it is important to include a control group (also called a placebo group) in medical experiments to evaluate the difference between the placebo effect and the actual effect of the treatment.
The simple existence of placebo groups is sometimes not sufficient for avoiding bias in experiments. If members of the placebo group have any knowledge (or suspicion) that they are not being given an actual treatment, then the effect of the treatment cannot be accurately assessed. For this reason, double-blind experiments are generally preferable. In this case, neither the experimenters nor the subjects are aware of the subjects' group status. This eliminates the possibility that the experimenters will treat the placebo group differently from the treatment group, further reducing experimental bias.
If an experimenter is aware of specific differences among groups of subjects or objects within an experimental group, he or she may prefer a randomized block design to a completely randomized design. In a block design, experimental subjects are first divided into homogeneous blocks before they are randomly assigned to a treatment group. If, for instance, an experimenter had reason to believe that age might be a significant factor in the effect of a given medication, he might choose to first divide the experimental subjects into age groups, such as under 30 years old, 30-60 years old, and over 60 years old. Then, within each age level, individuals would be assigned to treatment groups using a completely randomized design.
As another example, suppose a researcher is carrying out a study of the effectiveness of four different skin creams for the treatment of a certain skin disease. He has eighty subjects and plans to divide them into 4 treatment groups of twenty subjects each. Using a randomized block design, the subjects are assessed and put in blocks of four according to how severe their skin condition is; the four most severe cases are the first block, the next four most severe cases are the second block, and so on to the twentieth block. The four members of each block are then randomly assigned, one to each of the four treatment groups.
Although randomization helps to insure that treatment groups are as similar as possible, the results of a single experiment, applied to a small number of objects or subjects, should not be accepted without question. Randomly selecting two individuals from a group of four and applying a treatment with "great success" generally will not impress the public or convince anyone of the effectiveness of the treatment. To improve the significance of an experimental result, replication, the repetition of an experiment on a large group of subjects, is required. If a treatment is truly effective, the long-term averaging effect of replication will reflect its experimental worth. If it is not effective, then the few members of the experimental population who may have reacted to the treatment will be negated by the large numbers of subjects who were unaffected by it. Replication reduces variability in experimental results, increasing their significance and the confidence level with which a researcher can draw conclusions about an experimental factor.