Experimental criminology is a part of a larger and increasingly expanding scientific research evidence-based movement in social policy. In general terms, this movement is dedicated to the improvement of society through the utilization of the highest-quality scientific evidence on what works best (see, e.g. Sherman et al., 1997). The evidence-based movement first began in medicine and has, more recently, been embraced by the social sciences.
II. Experimental, Quasi-Experimental, and Nonexperimental Research Designs
A. Randomized Experimental Designs
B. Quasi-Experimental Designs
C. Nonexperimental Designs
III. Systematic Reviews and Meta-Analytic Methods in Criminology
IV. Critiques of Experimentation in Criminology
V. Weisburd’s Principles to Overcome Ethical, Political, and Practical Problems in Experimentation
Experimental criminology is a part of a larger and increasingly expanding scientific research evidence– based movement in social policy. In general terms, this movement is dedicated to the improvement of society through the utilization of the highest-quality scientific evidence on what works best (see, e.g. Sherman et al., 1997). The evidence-based movement first began in medicine and has, more recently, been embraced by the social sciences. Criminologists such as David Farrington, Lorraine Mazerolle, Anthony Petrosino, Lawrence Sherman, David Weisburd, and Brandon Welsh, and organizations such as the Academy of Experimental Criminology and the Campbell Collaboration’s Crime and Justice Group, have been leading advocates for the advancement of evidence-based crime control policy in general and the use of randomized experiments in criminology in particular.
In an evidence-based model, the source of scientific evidence is empirical research in the form of evaluations of programs, practices, and policies. Not all evaluation designs are considered equal, however. Some evaluation designs, such as randomized controlled experiments, are considered more scientifically valid than others. The findings of stronger evaluation designs are privileged over the findings of weaker research designs in determining “what works” in criminological interventions. For instance, in their report to the U.S. Congress on what works in preventing crime, University of Maryland researchers developed the Maryland Scientific Methods Scale to indicate to scholars, practitioners, and policymakers that studies evaluating criminological interventions may differ in terms of methodological quality of evaluation techniques (Sherman et al., 1997). Randomized experiments are considered the gold standard in evaluating the effects of criminological interventions on outcomes of interest such as crime rates and recidivism.
Randomized experiments have a relatively long history in criminology. The first randomized experiment conducted in criminology is commonly believed to be the Cambridge– Somerville Youth Study (Powers &Witmer, 1951):
In that experiment, investigators first matched individual participants (youths nominated by teachers or police as “troubled kids”) on certain characteristics and then randomly assigned one to the innovation group receiving counseling and the other to a control group receiving no counseling. Investigators have continuously reported that the counseling program, despite the best intentions, actually hurt the program participants over time when compared to doing nothing to them at all. Although the first participant in the Cambridge–Somerville study was randomly assigned in 1937, the first report of results was not completed until 1951. (Weisburd, Mazerolle, & Petrosino, 2008, p. 4)
Relatively few randomized experiments in criminology were conducted during the 1950s, 1960s, and 1970s (Weisburd et al., 2008). However, the number of randomized experiments in criminology started to rise in the mid- 1980s. In their influential book titled Understanding and Controlling Crime, Farrington, Ohlin, and Wilson (1986) recommended the use of randomized experiments whenever possible to test criminal justice interventions. This book generated considerable interest in experimentation among criminologists and, more important, at funding agencies such as the U.S. National Institute of Justice, which sponsored a series of randomized controlled experiments in the late 1980s. In their examination of randomized experiments on crime and justice, Farrington and Welsh (2006) found that experiments with a minimum of 100 participants more than doubled between 1957 and 1981, when there were 37, and between 1982 and 2004, when there were 85. Although randomized experiments in criminology are more common now compared with the 1980s, they continue to represent a small percentage of the total number of impact or outcome evaluations conducted in areas relevant to crime and justice each year (Weisburd et al., 2008).
This research paper begins by describing the key features of experimental, quasi-experimental, and nonexperimental research designs. The strengths of randomized experiments in determining cause and effect in criminology are assessed relative to these other commonly used research designs. Next, systematic reviews of existing evaluations and metaanalytic methods to synthesize the effectiveness of criminological interventions are discussed. These techniques represent important new features of the evidence-based policy movement in criminology. The research paper concludes by reviewing the critiques of experimentation in criminology and then presents a series of recommendations to overcome the ethical, political, and practical barriers to experimentation in crime and justice.
II. Experimental, Quasi-Experimental, and Nonexperimental Research Designs
A. Randomized Experimental Designs
Randomized experimental designs allow researchers to assume that the only systematic difference between the control and treatment groups is the presence of the intervention; this permits a clear assessment of causes and effects (Campbell & Stanley, 1966; Cook & Campbell, 1979; Sechrest & Rosenblatt, 1987). The classical experimental design involves three major pairs of components: (1) independent and dependent variables, (2) treatment and control groups, and (3) pretesting and posttesting.
Experiments essentially examine the effect of an independent variable on a dependent variable. The independent variable usually takes the form of a treatment stimulus, which is either present or not. For instance, an experiment could examine the effect of an in-prison education program (the independent variable) on recidivism (the dependent variable) when offenders are released from prison. Another important element of an experiment is the presence of treatment and control groups. The use of a control group allows the researcher to determine what would have happened if the treatment stimulus or intervention had not been applied to the treatment group (often referred to as the counterfactual). The treatment group (sometimes called the experimental group) receives the stimulus or intervention to be tested, and the control group does not. It is critical for the treatment and control groups to be equivalent; this means that there are no systematic differences between the two groups that could affect the outcome of the experiment. During the pretest period, treatment and control groups are both measured in terms of the dependent variable. After the stimulus or intervention is administered to the control group, the dependent variable is measured again, in the posttest period. Differences noted between the pretest and posttest period on the dependent variable are then attributed to the influence of the treatment.
Randomization is the preferred method for achieving comparability in the treatment and control groups. After subjects are recruited by whatever means, the researchers randomly assign those subjects to either the treatment or control group. Although it cannot be assumed that the recruited subjects are necessarily representative of the larger population from which they were drawn, random assignment ensures that the treatment and control groups will be reasonably similar (Babbie, 2004). If randomization is done correctly, the only systematic difference between the two groups should be the presence or absence of the treatment. Experiments that use randomization to create equivalent groups are often called randomized controlled trials.
In designing experiments, evaluators need to ensure that the research design is powerful enough to detect a treatment effect if one exists. The power of a statistical test is the probability that the test will reject a false null hypothesis (Lipsey, 1990) that there is no statistically significant difference in the outcomes of the treatment and control groups. Statistical power is a very complex problem, especially in experimental research. Power estimates are often based simply on the number of cases in the study, with the general observation that larger numbers of subjects increases the power of statistical tests to detect treatment effects (Lipsey, 1990). However, as Weisburd (1993) pointed out, the number of cases is often a misleading measure. He found that the smaller the experiment, the better control of variability in treatment and design. Statistical power may, in fact, be larger than expected.
Randomized controlled trials are known for their high degree of internal validity. The problem of internal validity refers to the possibility that the conclusions drawn from the experimental results may not accurately reflect what has gone on in the experiment itself (Cook & Campbell, 1979). The main threats to internal validity are well-known and, when executed properly, randomized controlled trials will handle each of eight internal validity problems (Farrington &Welsh, 2006, p. 59):
- Selection: The effect reflects preexisting differences between treatment and control conditions.
- History: The effect is caused by some event occurring at the same time as the intervention.
- Maturation: The effect reflects a continuation of preexisting trends.
- Instrumentation: The effect is caused by a change in the method of measuring the outcome.
- Testing: The pretest measurement causes a change in the posttest measure.
- Regression to the mean: When an intervention is implemented on units with unusually high scores (e.g., areas with high crime rates), natural fluctuation will cause a decrease in these scores on the posttest, which may be mistakenly interpreted as an effect of the intervention.
- Differential attrition: The effect is caused by differential loss of units (e.g., people) from experimental compared to control conditions.
- Causal order: It is unclear whether the intervention preceded the outcome.
External validity problems involve the generalizability of the experimental findings to the “real” world (Cook & Campbell, 1979). Inferences about cause–effect relationships based on a specific scientific study are said to possess external validity if they may be generalized from the unique and idiosyncratic experimental settings, procedures, and participants to other populations and conditions. Causal inferences said to possess high degrees of external validity (also referred to as population validity) can reasonably be expected to apply to the target population of the study from which the subjects were drawn and to the universe of other populations across time and space.
The well-known Minneapolis Domestic Violence Experiment and its subsequent replications offer a cautionary tale on the external validity of experimental findings when interventions are applied to other subjects and in other settings (Sherman, 1992). The Minneapolis experiment was undertaken to determine the best way to prevent the risk of repeated violence by the suspect against the same victim in the future. Three approaches were tested. The traditional approach was to do very little, because it was believed that the offenders would not be punished harshly by the courts and that the arrest might provoke further violence against the victim. A second approach was for the police to undergo special training enabling them to mediate ongoing domestic disputes. The third approach was to treat misdemeanor violence as a criminal offense and arrest offenders in order to teach them that their conduct was serious and to deter them from repeating it. The experiment revealed that, in Minneapolis, arrest worked best: It significantly reduced repeat offenses relative to the other two approaches (Sherman & Berk, 1984). The results of the experiment were very influential; many police departments adopted mandatory misdemeanor arrest policies, and a number of states adopted mandatory misdemeanor arrest and prosecution laws. However, replications of the Minneapolis domestic violence experiment in five other cities did not produce the same findings. In his review of those differing findings, Sherman (1992, p. 19) identified four policy dilemmas for policing domestic violence:
- Arrest reduces domestic violence in some cities but increases it in others.
- Arrest reduces domestic violence among employed people but increases it among unemployed people.
- Arrest reduces domestic violence in the short run but can increase it in the long run.
- Police can predict which couples are most likely to suffer future violence, but our society values privacy too highly to encourage preventive action.
This experience suggests that experimental findings need to be replicated before enacting mandatory interventions that could, in fact, have varied effects across different settings and subjects.
B. Quasi-Experimental Designs
The quasi-experiment is a research design that has some, but not all, of the characteristics of a true experiment (Cook & Campbell, 1979). As such, quasi-experiments do not have the same high degree of internal validity as randomized controlled trials. Although there are many types of quasi-experimental research designs, the element most frequently missing is the random assignment of subjects to the treatment and control conditions. In developing an equivalent control group, the researcher often uses matching instead of randomization. For example, a researcher interested in investigating the effects of a new juvenile curfew on crime in a particular city would try to find a city with similar crime rates and citizen demographics in the same geographic region. This matching strategy is sometimes called a nonequivalent group comparison design because the treatment and control cities will not be exactly the same. In the statistical analysis of quasi-experimental data, researchers will often attempt to isolate treatment effects by including covariates to account for any measurable factors that could also influence observed differences in the dependent variable (e.g., poverty levels, youth population size, and the like). This results in less confidence in study findings than true experimental approaches because it is possible that the difference in outcome may be due to some preexisting difference between the treatment and control groups that was not taken into account by the evaluators.
Quasi-experimental interrupted time series analysis, involving before-and-after measurements for a particular dependent variable, represents a common type of evaluation research found in criminology and criminal justice. One of the intended purposes for doing this type of quasi-experimental research is to capture longer time periods and a sufficient number of different events to control for various threats to validity and reliability (Cook & Campbell, 1979). Long series of observations are made before and after the treatment. The established before-treatment trend allows researchers to predict what may have happened without the intervention. The difference between what actually happened after the intervention and the predicted outcome determines the treatment effect. These approaches are often criticized for not accounting for other confounding actors that may have caused the observed differences. It can also be difficult to model the trend in the time series so the treatment effect can be properly estimated. For instance, in their evaluation of the 1975 Massachusetts Bartley–Fox gun control law that mandated a year in prison for illegal carrying of firearms, Deutsch and Alt (1977) used an interrupted time series quasi-experimental design and found that the passage of the law was associated with a statistically significant reduction in armed robbery in Boston. However, Hay and McCleary (1979) reanalyzed these data using a different quasi-experimental time series modeling approach and found no statistically significant reduction in Boston armed robberies associated with the passage of the law. In contrast, Pierce and Bowers (1981) found statistically significant violence reductions associated with the passage of the law using quasi-experimental interrupted time series analysis with multiple control group comparisons.
Although these designs are still likely to have lower internal validity than randomized experimental evaluations, quasi-experiments that combine the use of a control group with time series data can sometimes produce results that are of similar quality to randomized controlled trials (Lipsey &Wilson, 1993). Some researchers, however, have found that even strongly designed quasi-experiments produce less valid outcomes when compared with well-executed randomized controlled trials (see Weisburd, Lum, & Petrosino, 2001). In general, the persuasiveness of quasi-experiments should be judged on a case-by-case basis (Weisburd et al., 2001). For experimental criminology, the implication is that randomized controlled trials are necessary to produce the most valid and unbiased estimates of the effects of criminal justice interventions.
In their evaluation of the Operation Ceasefire gang violence reduction strategy, Braga and his colleagues (Braga, Kennedy, Waring, & Piehl, 2001) used a quasi-experimental interrupted time series analysis with multiple comparison groups to compare youth homicide trends in Boston with youth homicide trends in other major U.S. cities. They found a statistically significant 63% reduction in youth homicides associated with the implementation of the Ceasefire strategy. The evaluation also suggested that Boston’s significant youth homicide reduction associated with Operation Ceasefire was distinct when compared with youth homicide trends in most major U.S. and New England cities (Braga et al., 2001).
The National Academies Panel on Improving Information and Data on Firearms (Wellford, Pepper, & Petrie, 2005) concluded that the Ceasefire evaluation was compelling in associating the intervention with the subsequent decline in youth homicide (see also Morgan & Winship, 2007). However, the panel also suggested that many complex factors affect youth homicide trends and that it was difficult to specify the exact relationship between the Ceasefire intervention and subsequent changes in youth offending behaviors (see also Ludwig, 2005). Although the Ceasefire evaluation controlled for existing violence trends and certain rival causal factors, such as changes in the youth population, drug markets, and employment in Boston, there could be complex interaction effects among these factors not measured by the evaluation that could account for some meaningful portion of the decrease. The evaluation was not a randomized controlled experiment; therefore, the nonrandomized control group research design cannot rule out these internal threats to the conclusion that Ceasefire was the key factor in the youth homicide decline. Other quasi-experimental evaluations face similar critiques when attempting to unravel cause and effect associated with the implementation of specific criminal justice intervention.
Another type of quasi-experimental design is known as a natural experiment, whereby nature, or some event, has created treatment and control groups. In contrast to laboratory experiments, these events are not created by scientists, but they yield scientific data nontheless. The classic example is the comparison of crime rates in areas after the passage of a new law or implementation of a crime prevention initiative that affects one area and not another. For instance, the 1994 Brady Handgun Violence Prevention Act established a nationwide requirement that licensed firearms dealers observe a waiting period and initiate a background check for handgun sales. To assess the impact of the Brady Law on violence, Ludwig and Cook (2000) examined trends in homicide and suicide rates, controlling for population age, race, poverty, and other covariates, in the 32 “treatment” states directly affected by the Brady Act requirements and compared them with the 18 “control” states and the District of Columbia, which had equivalent legislation already in place. They found that the Brady Act appeared to be associated with reductions in the firearm suicide rate for persons age 55 years or older but not with reductions in homicide rates or overall suicide rates.
C. Nonexperimental Designs
Studies that rely only on statistical controls are often seen as representing the weakest level of confidence in research findings (Cook & Campbell, 1979; Sherman et al., 1997). These studies are typically called nonexperimental or observational research designs. In these studies, researchers do not vary treatments to observe their effects on outcomes; instead, they examine natural variation in a dependent variable of interest, such as crime, and estimate the effect of an independent variable, such as police staffing levels, on the basis of its covariation with the dependent variable. Additional covariates related to variation in the dependent variable will be included in the model as statistical controls to isolate the effect of the key independent variable of interest. The difficulty of this approach is that there could easily be excluded factors related to both the key independent variable and dependent variable that bias the estimated relationship between these variables. Unfortunately, for some sensitive areas in crime and justice, nonexperimental research designs are the only method of investigation possible. Although some scholars argue that it is possible to develop statistical models that provide highly valid results (e.g., Heckman & Smith, 1995), it is generally agreed that causes unknown or unmeasured by the researcher are likely to be a serious threat to the internal validity of nonexperimental research designs (Cook & Campbell, 1979).
III. Systematic Reviews and Meta-Analytic Methods in Criminology
There is a consensus among those who advocate for evidence-based crime policy that systematic reviews are an important tool in this process. In systematic reviews, researchers attempt to gather relevant evaluative studies in a specific area (e.g., the impact of correctional boot camps on offending), critically appraise them, and come to judgments about what works “using explicit, transparent, state-of- the-art methods” (Petrosino, Boruch, Soydan, Duggan, & Sanchez-Meca, 2001, p. 21). Rigorous methods are used to summarize, analyze, and combine study findings. The Campbell Collaboration Crime and Justice Group, formed in 2000, aims to prepare and maintain systematic reviews of criminological interventions and to make them electronically accessible to scholars, practitioners, policymakers, and the general public (Farrington & Petrosino, 2001; see also http://www.campbellcollaboration.org/). The Crime and Justice Group requires reviewers of criminological interventions to select studies with high internal validity, such as randomized controlled trials and well-designed quasi-experiments with comparison groups (Farrington & Petrosino, 2001).
Meta-analysis is a method of systematic reviewing and was designed to synthesize empirical relationships across studies, such as the effects of a specific crime prevention intervention on criminal offending behavior (Wilson, 2001). Meta-analysis quantifies the direction and the magnitude of the findings of interest and uses specialized statistical methods to analyze the relationships between findings and study features (Lipsey &Wilson, 1993;Wilson, 2001). Although the methods are technical, meta-analysis provides a defensible strategy for summarizing the effects of crime prevention and intervention efforts for informing public policy (Wilson, 2001). For instance, Farrington and Welsh (2005) carried out a series of meta-analyses of criminological experiments of the last 20 years and concluded that prevention methods in general, and multisystemic therapy in particular, were effective in reducing offending. They also reported that correctional therapy, batterer treatment programs, drug courts, juvenile restitution, and police targeting of crime hot spots were effective. However, “Scared Straight” programs and boot camps for offenders were not effective at preventing crime.
IV. Critiques of Experimentation in Criminology
Randomized experiments present many challenges. For instance, there are often problems of getting permission and cooperation from policymakers and practitioners that lead to case flow problems and difficulties in successfully achieving randomization. Although there is a large literature examining the barriers to experimentation (e.g., Baunach, 1980; Heckman & Smith, 1995; Petersilia, 1989), Clarke and Cornish (1972) raised several concerns with experimentation in crime and justice that had a major chilling effect on the development of experimental research in England during the 1970s (Farrington &Welsh, 2006). Although several experimental criminologists have responded to these concerns (e.g., Farrington, 2003; Weisburd, 2003), and the number of crime and justice experiments have increased in England and the United States over the last 25 years (Farrington & Welsh, 2006), the issues raised by Clarke and Cornish (1972) continue to be influential in resisting experimental methods in crime and justice today (see Pawson & Tilley, 1997).
Clarke and Cornish’s (1972) critique of experimentation is based on their experience in implementing a large-scale randomized experiment to evaluate a therapeutic community at a training school for delinquent boys in England. One major concern was that practitioners undermined the experiment by limiting the number of boys who could be considered for random allocation. Practitioners were very concerned that the boys would not receive the treatment that was most suitable for them. They felt that it was unethical for the boys to receive anything less than the most appropriate treatment. This led to the research being extended for a much longer time period and eventually stopped before the desired number of cases for the study was achieved. However, in response, experimental criminologists suggested that the ethical questions raised by the practitioners had more to do with contrasting belief systems between practitioners and researchers rather than the ethics of experimentation in crime and justice (Weisburd, 2003). The practitioners believed they knew which treatments worked best for the boys. Researchers, however, thought that the effectiveness of treatment was not clear and implemented a randomized study to determine what worked. As a result, the practitioners undermined the experimental evaluation.
Another concern put forth by Clarke and Cornish (1972) referred to the difficulty of generalizing from experimental studies (i.e., the problem of external validity). They argued that the unique institutional settings at the training school were difficult to disentangle from the treatment itself. Clarke and Cornish further argued that institutions that agree to experimentation are a self-selected group that are not representative of the general population of institutions; as such, experiments that include them tell one little about the workings of the treatment and their outcomes in the real world. In response, experimental criminologists argue that support for experimentation from larger governmental agencies, such as the U.S. National Institute of Justice, would encourage broader involvement of institutions in experimentation (Weisburd, 2003). Although this larger group is still likely to be self-selected and generalizability may still be limited, encouragement rather than discouragement of experimental study in crime and justice by funders would lead to the development of more generalizable experiments (Weisburd, 2003).
The strongest criticism of randomized experiments raised by Clarke and Cornish (1972) was that experimental studies are too rigid to address the complexity of crime and justice contexts. The treatment at the training school involved many components that varied over the course of the experiment; thus, it was impossible to clearly define the treatment being tested. Whatever evaluation results were obtained would not have explainable. In essence, the experiment might have been able to say what happened, but not be able to answer how or why it happened (Weisburd, 2003, p. 349). Pawson and Tilley (1997) argued that experiments tend to be inflexible and use broad categorical treatments; as a result, experimental designs miss the important interaction between the nature of the treatment and the nature of the subjects being studied. Experimental criminologists acknowledge that the use of experimental approaches in complex social settings requires the development of experimental methods that are capable of addressing the complexity of crime and justice treatments, settings, and subjects (Weisburd, 2003). Of course, this requires a commitment to institutionalize experimental methods in the crime and justice field.
V. Weisburd’s Principles to Overcome Ethical, Political, and Practical Problems in Experimentation
Randomized experiments are often excluded in criminal justice and criminological research for either ethical, political, or practical concerns. However, in reality randomized experiments are possible and appropriate in many circumstances. For experimenters, the challenge is to identify the conditions under which experiments can be successfully implemented in criminal justice settings. Weisburd (2000) identified eight principles to help practitioners and researchers assess when experimentation is most feasible. This section presents Weisburd’s principles and summarizes his discussion of each. The first two principles involve ethical concerns, the next three principles involve political concerns, and the final three principles involve practical problems in criminal justice experimentation.
Principle 1: In the case of experiments that add additional resources to particular criminal justice agencies or communities or provide treatments for subjects, there are generally fewer ethical barriers to experimental research (Weisburd, 2000, p. 184). It is important to differentiate the nature of the criminological intervention to be evaluated at the outset of the evaluation. Ethical problems are not likely to be raised when researchers provide new resources to offenders, such as rehabilitative services, or to communities, such as additional police patrols. The assumption is that the control group will continue to receive traditional levels of criminological intervention. Criminal justice experiments are often framed as tests of whether a new intervention is better than an existing one. However, when treatment is withdrawn from control subjects, serious ethical questions will arise; thus, Weisburd suggested that crime and justice experiments can often be defined as including treatment and comparison groups, rather than treatment and control groups.
Principle 2: Experiments that test sanctions that are more lenient than existing penalties are likely to face fewer barriers than those that test sanctions more severe than existing penalties (Weisburd, 2000, p. 185). So-called sanctioning experiments have produced the most serious ethical problems in criminal justice experimental study. In these evaluations, random allocation rather than the traditional decision-making power of criminal justice practitioners is used to make decisions about the processing of individual offenders (Weisburd, 2000). Arrest, sentence, and imprisonment decisions are based on random allocation. It is important to remember that sanctioning experiments allocate sanctions that are legally legitimate to impose on offenders. Ethical concerns are raised in connection with how the sanction is applied rather than to the harshness of the sanction itself. Clearly, there needs to be a balance between the criminal justice system’s need to find answer to important policy questions and its commitment to equity in allocating sanctions. Weisburd suggested that, when designing sanctioning experiments, questions be framed in a way that allows ethical barriers to be removed—for instance, by using the experiment to test whether the criminal justice can be lenient in the allocation of sanctions rather than being harsh. The California Reduced Prison Experiment released some offenders from prison earlier than their sentenced release date (Berecochea & Jaman, 1981). Leniency for a few generated no major ethical objections despite thousands of offenders who were left in prison for longer periods of time on the basis of a random allocation scheme. However, the end result was two distinct groups that received more or less punitive sanctions.
Principle 3: Experiments that have lower public visibility will generally be easier to implement (Weisburd, 2000, p. 186). Obviously, in additional to ethical concerns there are political costs to criminal justice experimentation. Although reducing penalties for certain offenders may not generate ethical objection, the approach may generate strong political resistance to the experiment. Citizens may not want offenders to return to the community before their natural sentence expiration date. Citizens may also exert political pressure to halt an experiment if additional criminal justice resources are randomly allocated. For instance, in the Jersey City Drug Market Analysis Experiment (Weisburd & Green, 1995), citizens in comparison drug market hot-spot areas were very concerned that they were not receiving the increased police attention given to the treatment drug market hot-spot areas. As Weisburd (2000) observed, this problem is similar to those encountered in medical research where interest groups fight to have experiments abandoned so medication will be provided to all who might benefit from it. These political problems are less likely to emerge when experiments are less visible to the public. As such, researchers should resist the temptation to publicize experiments before they are completed.
Principle 4: In cases where treatment resources are limited, there is generally less political resistance to random allocation (Weisburd, 2000, p. 186). There are circumstances in which it can be easier for researchers to defend random allocation in the context of the politics of the allocation of treatments (Weisburd, 2000). Often, treatments and new programs can be applied to only a few areas or a small number of individuals. When communities or individuals understand that they have not been systematically excluded from additional resources, experiments do not provide larger political problems when compared with nonexperimental evaluation designs. For instance, in the High Intensity Drug Trafficking Area drug treatment experiment (Weisburd & Taxman, 2000), practitioners were much less resistant to an experimental design because they could not provide treatment to all eligible subjects. As Weisburd suggested, random allocation can serve as a type of pressure valve in the allocation of scarce criminal justice resources. Random allocation can be a politically safer basis on which to apply treatment when compared with other criteria.
Principle 5: Randomized experiments are likely to be easier to develop if the subjects of the intervention represent less serious threats to community safety (Weisburd, 2000, p. 187). When the potential risks to the community are minimized, it is much easier for policymakers and practitioners to defend the use of randomization (Weisburd, 2000). Very few experiments have involved high-risk violent offenders that would generate serious threats to community safety.
Principle 6: Experiments will be most difficult to implement when the researcher attempts to limit the discretion of criminal justice agents who generally operate with a great degree of autonomy and authority (Weisburd, 2000, p. 187). Relative to ethical and political concerns, practical barriers have generally been more significant in explaining resistance to criminal justice experimentation (Weisburd, 2000). Even though there is a wide range of methodological issues facing experimenters, it can be very difficult to get practitioners to agree to random allocation. Although this problem is related to the ethical and political concerns already discussed, it is important to recognize that random allocation interferes with the daily operations of the affected agencies. Judges are generally more resistant to random allocation when compared with other criminal justice practitioners. They have been known to subvert experiments by not properly assigning subjects even when they have agreed to random allocation of sanctions and programs. For example, in the Denver Drunk Driving Experiment (Ross & Blumenthal, 1974) judges were supposed to randomly allocate fines and two different types of probation to convicted drunk drivers. Unfortunately, in more than half the cases judges circumvented the randomization process in response to defense attorney pleas for their clients to receive fines rather than probation. Weisburd observed that the likelihood of success in randomization is linked to the nature of the decisions being made. He suggested a subprinciple in developing randomization procedures: “Where treatment conditions are perceived as similar in leniency to control conditions, it will be easier to carry out a randomized study involving high-authority and high-autonomy criminal justice agents” (Weisburd, 2000, p. 188).
In Project Muster, a probation experiment in New Jersey, Weisburd (1991) found that the judges correctly randomized nearly all study subjects. In this evaluation, judges were asked to sentence selected probationers who violated release conditions by not paying their fines to a program that involved intensive probation and job counseling. No restraint was placed on their sentencing decisions for other violated probationers. Because few violated offenders would have been sentenced to jail for failure to pay fines, judges did not feel that their discretion was overly compromised in selecting Muster instead of traditional probation.
Principle 7: Systems in which there is a strong degree of hierarchical control will be conducive to experimentation even when individual actors are asked to constrain temporarily areas where they have a considerable degree of autonomy (Weisburd, 2000, p. 188). Weisburd suggested that, in militaristic hierarchical agencies, such as the police and certain correctional agencies, it is often easier to execute experimental designs because such agencies have rigid organizational structures. This is particularly true when the discretion is limited for the targets selected rather than the choice of action or decision. Experiments in Minneapolis, Minnesota (Sherman & Weisburd, 1995), and Jersey City, New Jersey (Braga et al., 1999; Weisburd & Green, 1995), were executed with success when police officers were focused on treatment crime hot spots and restricted from operating in control hot spot areas. In policing, hierarchical control also explains why it has been possible to implement experiments in which treatment and control conditions vary significantly and the line-level agent has traditionally exercised considerable autonomy (Weisburd, 2000). This was evident in the six domestic violence experiments supported by the National Institute of Justice in which misdemeanor spouse abusers were randomly assigned to either arrest or nonarrest conditions (Sherman, 1992). These studies did not show the extensive subversion to randomization seen in other criminal justice experiments, such as the Denver Drunk Driving Experiment.
Principle 8: Where treatments are relatively complex, involving multiple actions on the part of criminal justice agents or actions that they would not traditionally take, experiments can become prohibitively cumbersome and expensive (Weisburd, 2000, p. 190). Once randomization has been successfully achieved, maintaining the integrity of the treatment is the most difficult task for experimenters (Boruch, 1997; Weisburd, 2000). Experiments cannot be simply a before-and-after effort by researchers; it is very important to document and analyze what is actually happening in the treatment and control groups (Weisburd, 2000). Developing methods to monitor and ensure the integrity of the treatment is crucial. If the treatment is not implemented properly, it would not be surprising to find that the intervention did not generate an effect. For studies that involve one-shot interventions, this process can be relatively simple (e.g., documenting whether a subject was properly placed in a condition such as arrest, violation, or incarceration). However, if experimental treatments are complex, it will be correspondingly more difficult, time-consuming and costly to track and ensure the integrity of the treatment.
There is now a large, and growing, literature indicating that ethical, political, and practical barriers can be overcome and that randomized experiments are appropriate in a very diverse group of circumstances and across many aspects of decision making in the criminal justice system (Boruch, Snyder, & DeMoya, 2000; Petrosino et al., 2001; Weisburd, 2000, 2003). To some observers (e.g., Weisburd, 2003), the failure of crime and justice funders and evaluators to develop a comprehensive infrastructure for experimental evaluation represents a serious violation of professional standards:
A related line of argument here is that a failure to discover whether a program is effective is unethical. That is, if one relies solely on nonrandomized assessments to make judgments about the efficacy of a program, subsequent decisions may be entirely inappropriate. Insofar as a failure to obtain unequivocal data on effects then leads to decisions which are wrong and ultimately damaging, that failure may violate good standards of both social and professional ethics. Even if the decisions are “correct” in the sense of coinciding with those one might make based on randomized experiment data, ethical problems persist. The right action taken for the wrong reasons is not especially attractive if we are to learn anything about how to effectively handle the child abuser, the chronically ill, . . . and so forth. (Boruch, 1975, p. 135)
According to Weisburd (2003), the key question is why a randomized experiment should not be used: “The burden here is on the researcher to explain why a less valid method should be the basis for coming to conclusions about treatment and practice” (p. 352).
As the 21st century unfolds, the available evidence suggests that the number of randomized experiments in criminology will continue to grow (Farrington & Welsh, 2006). As Weisburd, Mazerolle, and Petrosino (2008) observed, there is a growing consensus among scholars, practitioners, and policymakers that crime control practices and policies should be rooted as much as possible in scientific research. The findings of randomized experiments are considered more scientifically valid than the findings generated by quasi-experiments and observational research studies. Experimental findings are usually privileged over the findings of these weaker research designs in determining effective crime control practice and policy. Implementing randomized experiments in field settings can be very difficult for a number of ethical, political, and practical concerns. However, many of these barriers to experimentation can be overcome; thus, randomized experiments will continue to become ever-important components of criminological inquiry.
- Babbie, E. (2004). The practice of social research (10th ed.). Belmont, CA: Wadsworth.
- Baunach, P. (1980). Random assignment in criminal justice research: Some ethical and legal issues. Criminology, 17, 435–444.
- Berecochea, J., & Jaman, D. (1981). Time served in prison and parole outcome: An experimental study (Report No. 2). Sacramento, CA: California Department of Corrections Research Division.
- Boruch, R. (1975). On common contentions about randomized field experiments. In R. Boruch & H. Reicken (Eds.), Experimental testing of public policy: The Proceedings of the 1974 Social Sciences Research Council Conference on Social Experimentation (pp. 107–142). Boulder, CO: Westview Press.
- Boruch, R. (1997). Randomized experiments for planning and evaluation. Thousand Oaks, CA: Sage.
- Boruch, R., Snyder, B., & DeMoya, D. (2000). The importance of randomized field trials. Crime & Delinquency, 46, 156–180.
- Braga, A., Kennedy, D.,Waring, E., & Piehl, A. (2001). Problem-oriented policing, deterrence, and youth violence: An evaluation of Boston’s Operation Ceasefire. Journal of Research in Crime and Delinquency, 38, 195–225.
- Braga, A., Weisburd, D., Waring, E., Green Mazerolle, L., Spelman, W., & Gajewski, F. (1999). Problem-oriented policing in violent crime places: A randomized controlled experiment. Criminology, 37, 541–580.
- Campbell, D., & Stanley, J. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally.
- Clarke, R., & Cornish, D. (1972). The controlled trial in institutional research. London: HMSO.
- Cook, T., & Campbell, D. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin.
- Deutsch, S., & Alt, F. (1977). The effect of Massachusetts’ gun control law on gun-related crimes in the city of Boston. Evaluation Quarterly, 1, 543–568.
- Farrington, D. (2003). British randomized experiments on crime and justice. Annals of the American Academy of Political and Social Science, 589, 150–167.
- Farrington, D., & Petrosino, A. (2001). The Campbell Collaboration Crime and Justice Group. Annals of the American Academy of Political and Social Science, 578, 35–49.
- Farrington, D., & Welsh, B. (2005). Randomized experiments in criminology: What have we learned in the last two decades? Journal of Experimental Criminology, 1, 9–38.
- Farrington, D., &Welsh, B. (2006).A half century of randomized experiments on crime and justice. In M. Tonry (Ed.), Crime and justice (Vol. 34, pp. 55–132). Chicago: University of Chicago Press.
- Farrington, D. P., Ohlin, L. E., & Wilson, J. Q. (1986). Understanding and controlling crime: Toward a new research strategy. New York: Springer.
- Hay, R., & McCleary, R. (1979). On the specification of Box- Tiao time series models for impact assessment: A comment on the recent work of Deutsch and Alt. Evaluation Quarterly, 3, 277–314.
- Heckman, J., & Smith, J. (1995). Assessing the case for social experiments. Journal of Economic Perspectives, 9, 85–110.
- Lipsey, M. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage.
- Lipsey, M., & Wilson, D. (1993). Practical meta-analysis. Newbury Park, CA: Sage.
- Ludwig, J. (2005). Better gun enforcement, less crime. Criminology & Public Policy, 4, 677–716.
- Ludwig, J., & Cook, P. (2000). Homicide and suicide rates associated with the implementation of the Brady Handgun Violence Prevention Act. Journal of the American Medical Association, 284, 585–591.
- Morgan, S., & Winship, C. (2007). Counterfactuals and causal inference. New York: Cambridge University Press.
- Pawson, R., &Tilley, N. (1997). Realistic evaluation. London: Sage.
- Petersilia, J. (1989). Implementing randomized experiments: Lessons from BJA’s intensive supervision project. Evaluation Review, 13, 228–266.
- Petrosino, A., Boruch, R., Soydan, H., Duggan, L., & Sanchez- Meca, J. (2001). Meeting the challenge of evidence-based policy: The Campbell collaboration. Annals of the American Academy of Political and Social Science, 578, 14–34.
- Pierce, G., & Bowers, W. (1981). The Bartley-Fox gun law’s short-term impact on crime in Boston. Annals of the American Academy of Political and Social Science, 455, 120–137.
- Powers, E., & Witmer, H. (1951). An experiment in the prevention of delinquency: The Cambridge-Somerville Youth Study. New York: Columbia University Press.
- Ross, H., & Blumenthal, M. (1974). Sanctions for the drinking driver: An experimental study. Journal of Legal Studies, 3, 53–61.
- Sechrest, L., & Rosenblatt, A. (1987). Research methods. In H. Quay (Ed.),Handbook of juvenile delinquency (pp. 417–450). New York: Wiley.
- Sherman, L. (1992). Policing domestic violence: Experiments and dilemmas. New York: Free Press.
- Sherman, L., & Berk, R. (1984). The specific deterrent effects of arrest for domestic assault. American Sociological Review, 49, 261–72.
- Sherman, L., Gottfredson, D., MacKenzie, D., Eck, J., Reuter, P., & Bushway, S. (1997). Preventing crime: What works, what doesn’t and what’s promising. Washington, DC: U.S. Department of Justice.
- Sherman, L., & Weisburd, D. (1995). General deterrent effects of police patrol in crime “hot spots”: A randomized controlled trial. Justice Quarterly, 12, 625–648.
- Weisburd, D. (1991). Project Muster: The external evaluator’s report. Trenton, NJ: Administrative Office of the Courts. Weisburd, D. (1993). Design sensitivity in criminal justice experiments. In M. Tonry (Ed.), Crime and justice: A review of research (Vol. 17, pp. 337–379). Chicago: University of Chicago Press.
- Weisburd, D. (2000). Randomized experiments in criminal justice policy: Prospects and problems. Crime & Delinquency, 46, 181–193.
- Weisburd, D. (2003). Ethical practice and evaluation of interventions in crime and justice: The moral imperative for randomized trials. Evaluation Review, 27, 336–354.
- Weisburd, D., & Green, L. (1995). Policing drug hot spots: The Jersey City Drug Market Analysis Experiment. Justice Quarterly, 12, 711–735.
- Weisburd, D., Lum, C., & Petrosino, A. (2001). Does research design affect study outcomes in criminal justice? Annals of the American Academy of Political and Social Science, 578, 50–70.
- Weisburd, D., Mazerolle, L., & Petrosino, A. (2008). The Academy of Experimental Criminology: Advancing randomized trials in crime and justice. Retrieved from http://www.asc41.com/Criminologist/2007/2007_May-June_Criminologist.pdf
- Weisburd, D., & Taxman, F. (2000). Developing a multi-center randomized trial in criminology: The case of HIDTA. Journal of Quantitative Criminology, 16, 315–340.
- Wellford, C., Pepper, J., & Petrie, C. (Eds.). (2005). Firearms and violence: A critical review. Washington, DC: National Academies Press.
- Wilson, D. (2001). Meta-analytical methods for criminology. Annals of the American Academy of Political and Social Science, 578, 71–89.