Statistical information is increasingly likely to be presented in court. It may appear in civil cases (e.g., percentages of men and women employees in a gender discrimination case) or criminal cases (e.g., the defendant’s blood type matches that of a sample found at the crime scene and that blood type is found in only 20% of the population). Can jurors understand that information on their own, or must they rely on experts to explain its meaning? Even if jurors correctly understand statistical evidence, how do they combine that evidence with other, nonquantitative evidence?
In contrast to other areas of juror understanding (e.g., juror beliefs about factors affecting the accuracy of eyewitness identification), there is relatively little research directly answering these questions. Those studies can be broken into two broad categories. The first focuses primarily on understanding of the statistical evidence. The second asks how statistical evidence is combined with other nonstatistical evidence. Considered together, jurors have some difficulty understanding even a single piece of statistical evidence. That difficulty increases when faced with two pieces of statistical evidence. Jurors also tend to underuse statistical evidence, when compared with a Bayesian norm, even when provided with instructions on how to use such evidence. That underuse, however, conceals considerable variation.
Juror Understanding of Statistical Evidence
“Naked statistics” (sometimes referred to as base rates) are data that are true, regardless of what happened in a particular case. Mock jurors are not persuaded by naked statistics compared with mathematically equivalent evidence that is contingent on some ultimate fact (i.e., a fact essential to resolution of the case). For example, in the Blue Bus problem, a bus runs over a color-blind woman’s dog. The defendant, Company A, owns 80% of the buses in the area, and all of Company A’s buses are blue. Company B owns 20% of the buses, and its buses are gray. The color-blind woman cannot tell a blue bus from a gray bus, so she does not know which company’s bus ran over her dog. She sues Company A on the theory that, because Company A owns 80% of the buses in the area, there is an 80% chance that a Company A bus killed her dog. In experiments, jurors in one condition hear that the defendant owns 80% of the buses in the area, while those in another condition hear an 80%-accurate weigh-station attendant’s identification of the defendant bus company. Both sets of jurors believe it equally probable that the defendant’s blue bus, rather than Company B’s gray bus, killed the dog. But only jurors who heard the attendant’s testimony are willing to find against the bus company. Jurors who simply heard the naked statistics (Company A owns 80% of the buses) do not find Company A responsible. Similarly, although learning that the defendant is responsible for 80% of the accidents in the county leads to high probability estimates that the defendant’s bus killed the dog, jurors are unwilling to find the defendant responsible.
Most research has examined “nonnaked” statistical information—information in which one’s belief about the ultimate fact (in the example above, whether or not a blue bus hit the dog) is linked to one’s belief about the evidence (the weigh-station attendant’s accuracy). Some research finds that the manner in which statistical information is presented may affect mock jurors’ use of the information. For example, incidence rate information presented in the form of a conditional probability (there is only a 2% probability that the defendant’s hair would match the perpetrator’s if the defendant were innocent) may encourage some jurors to commit the prosecutor’s fallacy. These jurors believe that there is a 98% chance that the defendant is guilty. If the same information is presented as a percentage and number (a 2% match in a city of 1,000,000 people, meaning 20,000 people share that characteristic), some others may commit the defense attorney’s fallacy. They believe the evidence shows only a 1 in 20,000 chance that the defendant is the culprit. These errors may be more likely when an expert, rather than an attorney, offers the fallacious argument. An attorney who makes such an argument in the face of expert testimony (e.g., when the expert explains Bayes’s theorem) runs the risk of backlash; the defense attorney’s fallacy combined with expert Bayesian instruction may increase guilty verdicts.
Even nonfallacious presentations of statistical evidence pose challenges for jurors, particularly when they are evaluating low-probability events. Compare DNA incidence rates presented as 0.1 out of 10,000, 1 out of 100,000, or 2 out of 200,000. Mathematically, these rates are identical, but psychologically, they differ; jurors are more likely to find for the defendant in the latter two cases. Why? The first, fractional incidence rate contains no cues that people other than the defendant might match the DNA. Each of the other rates contains at least one exemplar within it, which encourages jurors to think about other people who might match. This effect may rest in part on the size of a broader reference group; it is easier to generate exemplars with an incidence rate of 1 in 100,000 when considering a city of 500,000 people than when considering a town of 500.
Jurors’ task becomes more difficult when they face both a random match probability (RMP) (e.g., there is a 1 in 1 million chance that the defendant’s DNA sample would match that of the perpetrator if the defendant is innocent) and a laboratory error rate (LE) (e.g., the laboratory makes a mistake in 2 of every 100 cases). The probability that a match occurred due either to chance or to lab error is roughly 2 in 100. Yet jurors who hear the separate RMP and LE (as recommended by the National Research Council) convict the defendant as often as those who hear only the much more incriminating RMP.
Why do jurors fail in combining an RMP and an LE? Traditional explanations point to various logical or mathematical errors. Another explanation suggests that jurors’ interpretation of statistical evidence necessarily reflects their expectancies about such data. Consider jurors who receive extremely small RMP estimates (1 in a billion) and comparatively large LE estimates (2 in 100), compared with those who receive comparatively large RMP estimates (2 in 100) and extremely small LE estimates (1 in a billion). Logical (e.g., we are more convinced by more vivid evidence, like 1 in a billion) or mathematical (e.g., we average probabilities) explanations for juror errors make identical predictions in the two cases. But instead, mock jurors are more likely to convict in the large RMP paired with small LE condition. Similarly, they are more likely to convict when presented with extremely small LE estimates and no RMP estimate than when presented with only an extremely small RMP estimate and no LE estimate. This difference may reflect jurors’ preexisting expectancies that the likelihood of a random match is extremely small and that of laboratory error is relatively large.
Some forms of statistical evidence (e.g., bullet lead analysis) illustrate that jurors must consider not just the reliability of statistical evidence but also its diagnosticity (usefulness). The value of a forensic match (e.g., the defendant’s DNA profile is the same as that of blood found at the crime scene) depends on reliability of the evidence (did the laboratory correctly perform the test?) and also its diagnosticity (could the match be a coincidence?). One study gave the same information about hit rate and false-positive rate to all jurors. It varied a third statistical piece of information: the diagnostic value of the evidence. Some jurors learned that all sample bullets taken from the defendant matched the composition of the murder bullet, while no bullets taken from a community sample matched (strong diagnostic evidence). Others learned that the matching rate for the defendant’s bullets was the same as that for bullets taken from a community sample (worthless diagnostic evidence). Jurors who received the strong diagnostic evidence were more likely to believe the defendant guilty. However, this effect held only for mock jurors who were relatively confident in their ability to draw conclusions from numerical data. Jurors who were less confident did not differ across conditions. Furthermore, jurors who heard the worthless diagnostic evidence tended to give it some weight before they deliberated; deliberation eliminated the effect.
How Jurors Combine Statistical Evidence with Nonstatistical Evidence
How do jurors combine numerous pieces of evidence (not necessarily statistical) to make decisions? Both mathematical (e.g., probability theory) and explanation-based (e.g., story model) approaches have been proposed. Research specifically examining the use of statistical evidence has generally followed a mathematical approach and has compared jurors’ probabilities (typically the probability that the defendant committed the crime) with probabilities calculated using Bayes’s theorem.
Bayes’s theorem prescribes how a decision maker should combine statistical evidence with prior evidence. Prior odds (the defendant’s odds of guilt, based on all previously presented evidence) are multiplied by the likelihood ratio (the probability that the new evidence would match the defendant if he or she is guilty, divided by the probability that the new evidence would match an innocent person). The product is the posterior odds. For example, after opening statements and eyewitness testimony, a juror might believe that there is a 25% chance that the defendant is guilty. The prior odds are 25:(100-25) = .33:1. If the defendant and the perpetrator share a blood type found in only 5% of the population, the likelihood ratio is 1:.05 = 20. The posterior odds, then, are .33:1 x 20 = 6.67:1. The probability of guilt is 6.67/(6.67 + 1) or .87. In short, for this juror, Bayes’s theorem states that the probability of guilt should increase from .25 to .87.
Only a handful of studies have compared jurors’ decisions with Bayesian norms, and the comparisons sometimes are difficult to make. Some studies have not asked for a prior probability, while others have requested beliefs that the evidence matched (instead of beliefs about guilt). Most have assumed that jurors accepted the statistical evidence at face value. Given these caveats, in general, jurors underuse statistical evidence, compared with a Bayesian norm. This general finding, however, masks underlying complexity. In many of these studies, the prior evidence, the statistical evidence, or both are relatively strong. In such cases, it is difficult to exceed Bayesian posterior probabilities (which are often .90 or greater). Also, there tends to be great variability in how jurors use the statistical evidence. That is, two jurors with identical prior probabilities may hear the same statistical evidence and arrive at very different posterior probabilities. These disparities may rest in part on differing expectancies about LEs (which typically have not been presented) or about other factors (e.g., potential investigator misconduct) affecting the value of the statistical evidence. But studies (reviewed above) of how jurors respond to statistical information by itself provide ample reason to suspect wide variation in jurors’ understanding. For example, jurors who claim to be comfortable with mathematics are more likely to be affected by statistical information than those who express discomfort. To further complicate matters, at least one study has found that later, nonprobabilistic evidence leads to a reevaluation of the quantitative evidence presented earlier.
Does instruction help jurors combine the statistical evidence with nonstatistical evidence? Studies have provided simple instructions. Typically, they have included a statistician’s testimony about how Bayes’s theorem works. The expert displays a table or a graph showing some sample prior probabilities and, given the statistical evidence, corresponding posterior probabilities. These relatively unsophisticated means of instruction, generally, have not affected jurors’ use of the evidence; jurors who receive the instruction come no closer to Bayesian norms than those who do not.
References:
- Koehler, J. J., & Macchi, L. (2004). Thinking about low-probability events: An exemplar-cuing theory. Psychological Science, 15, 540-546.
- Levett, L. M., Danielsen, E. M., Kovera, M. B., & Cutler, B. L. (2005). The psychology of jury and juror decision making. In N. Brewer & K. D. Williams (Eds.), Psychology and law: An empirical perspective (pp. 365-106). New York: Guilford Press.
- Niedermeier, K. E., Kerr, N. L., & Messe, L. A. (1999). Jurors’ use of naked statistical evidence: Exploring bases and implications of the Wells effect. Journal of Personality andSocial Psychology, 76, 533-542.
- Schklar, J., & Diamond, S. S. (1999). Juror reactions to DNA evidence: Errors and expectancies. Law and Human Behavior, 23, 159-184.