II. Experimental, Quasi-Experimental, and Nonexperimental Research Designs
A. Randomized Experimental Designs
Randomized experimental designs allow researchers to assume that the only systematic difference between the control and treatment groups is the presence of the intervention; this permits a clear assessment of causes and effects (Campbell & Stanley, 1966; Cook & Campbell, 1979; Sechrest & Rosenblatt, 1987). The classical experimental design involves three major pairs of components: (1) independent and dependent variables, (2) treatment and control groups, and (3) pretesting and posttesting.
Experiments essentially examine the effect of an independent variable on a dependent variable. The independent variable usually takes the form of a treatment stimulus, which is either present or not. For instance, an experiment could examine the effect of an in-prison education program (the independent variable) on recidivism (the dependent variable) when offenders are released from prison. Another important element of an experiment is the presence of treatment and control groups. The use of a control group allows the researcher to determine what would have happened if the treatment stimulus or intervention had not been applied to the treatment group (often referred to as the counterfactual). The treatment group (sometimes called the experimental group) receives the stimulus or intervention to be tested, and the control group does not. It is critical for the treatment and control groups to be equivalent; this means that there are no systematic differences between the two groups that could affect the outcome of the experiment. During the pretest period, treatment and control groups are both measured in terms of the dependent variable. After the stimulus or intervention is administered to the control group, the dependent variable is measured again, in the posttest period. Differences noted between the pretest and posttest period on the dependent variable are then attributed to the influence of the treatment.
Randomization is the preferred method for achieving comparability in the treatment and control groups. After subjects are recruited by whatever means, the researchers randomly assign those subjects to either the treatment or control group. Although it cannot be assumed that the recruited subjects are necessarily representative of the larger population from which they were drawn, random assignment ensures that the treatment and control groups will be reasonably similar (Babbie, 2004). If randomization is done correctly, the only systematic difference between the two groups should be the presence or absence of the treatment. Experiments that use randomization to create equivalent groups are often called randomized controlled trials.
In designing experiments, evaluators need to ensure that the research design is powerful enough to detect a treatment effect if one exists. The power of a statistical test is the probability that the test will reject a false null hypothesis (Lipsey, 1990) that there is no statistically significant difference in the outcomes of the treatment and control groups. Statistical power is a very complex problem, especially in experimental research. Power estimates are often based simply on the number of cases in the study, with the general observation that larger numbers of subjects increases the power of statistical tests to detect treatment effects (Lipsey, 1990). However, as Weisburd (1993) pointed out, the number of cases is often a misleading measure. He found that the smaller the experiment, the better control of variability in treatment and design. Statistical power may, in fact, be larger than expected.
Randomized controlled trials are known for their high degree of internal validity. The problem of internal validity refers to the possibility that the conclusions drawn from the experimental results may not accurately reflect what has gone on in the experiment itself (Cook & Campbell, 1979). The main threats to internal validity are well-known and, when executed properly, randomized controlled trials will handle each of eight internal validity problems (Farrington &Welsh, 2006, p. 59):
- Selection: The effect reflects preexisting differences between treatment and control conditions.
- History: The effect is caused by some event occurring at the same time as the intervention.
- Maturation: The effect reflects a continuation of preexisting trends.
- Instrumentation: The effect is caused by a change in the method of measuring the outcome.
- Testing: The pretest measurement causes a change in the posttest measure.
- Regression to the mean: When an intervention is implemented on units with unusually high scores (e.g., areas with high crime rates), natural fluctuation will cause a decrease in these scores on the posttest, which may be mistakenly interpreted as an effect of the intervention.
- Differential attrition: The effect is caused by differential loss of units (e.g., people) from experimental compared to control conditions.
- Causal order: It is unclear whether the intervention preceded the outcome.
External validity problems involve the generalizability of the experimental findings to the “real” world (Cook & Campbell, 1979). Inferences about cause–effect relationships based on a specific scientific study are said to possess external validity if they may be generalized from the unique and idiosyncratic experimental settings, procedures, and participants to other populations and conditions. Causal inferences said to possess high degrees of external validity (also referred to as population validity) can reasonably be expected to apply to the target population of the study from which the subjects were drawn and to the universe of other populations across time and space.
The well-known Minneapolis Domestic Violence Experiment and its subsequent replications offer a cautionary tale on the external validity of experimental findings when interventions are applied to other subjects and in other settings (Sherman, 1992). The Minneapolis experiment was undertaken to determine the best way to prevent the risk of repeated violence by the suspect against the same victim in the future. Three approaches were tested. The traditional approach was to do very little, because it was believed that the offenders would not be punished harshly by the courts and that the arrest might provoke further violence against the victim. A second approach was for the police to undergo special training enabling them to mediate ongoing domestic disputes. The third approach was to treat misdemeanor violence as a criminal offense and arrest offenders in order to teach them that their conduct was serious and to deter them from repeating it. The experiment revealed that, in Minneapolis, arrest worked best: It significantly reduced repeat offenses relative to the other two approaches (Sherman & Berk, 1984). The results of the experiment were very influential; many police departments adopted mandatory misdemeanor arrest policies, and a number of states adopted mandatory misdemeanor arrest and prosecution laws. However, replications of the Minneapolis domestic violence experiment in five other cities did not produce the same findings. In his review of those differing findings, Sherman (1992, p. 19) identified four policy dilemmas for policing domestic violence:
- Arrest reduces domestic violence in some cities but increases it in others.
- Arrest reduces domestic violence among employed people but increases it among unemployed people.
- Arrest reduces domestic violence in the short run but can increase it in the long run.
- Police can predict which couples are most likely to suffer future violence, but our society values privacy too highly to encourage preventive action.
This experience suggests that experimental findings need to be replicated before enacting mandatory interventions that could, in fact, have varied effects across different settings and subjects.