Polygraph is a general term that refers to the use of autonomic physiological measures to make assessments about a person’s credibility. Polygraph techniques find wide application in the criminal justice and national security systems of many countries, and their use is growing worldwide. There are two major families of polygraph techniques. Knowledge approaches look for responses that indicate knowledge possessed by a person attempting deception. Deception approaches assess credibility by examining a person’s response to accusatory questions that directly address the issues under investigation. Both approaches have strengths and weaknesses, and both are the subject of controversy in the scientific literature. This entry describes the approaches, their strengths and weaknesses, their application in practice, and the controversy concerning them.
History of Polygraph and Polygraph Techniques
The desire to use physiological responses as indices of truth or deception is a very old one. The lore of many cultures contains stories of trials by ordeal that have some basis in autonomic physiology. For example, many Asian cultures have legends of placing dry rice in the mouth of the accused. If the accused was able to spit out the rice it was assumed that he or she was not nervous and that he or she was truthful. If, however, the mouth was dry and the rice stuck, it was assumed that he or she was deceptive. Scientific research on the topic also has a long history, with reports of attempts to detect deception with physiological measures going back to the first psychologists and the end of the 19th century. However, this approach has one basic difficulty: To date, no specific physiological response, or pattern of physiological responses, has been identified that is uniquely identified with truth or deception. Therefore, efforts to use physiological measures have to rely on techniques of stimulus control and inference to assess credibility. During much of the 20th century, there was little interest in credibility assessment by scientific psychology, and the application of the polygraph for that purpose grew as a profession in law enforcement and national security agencies, primarily in the United States. A modern era of research began in the early 1970s in the laboratory of David Raskin at the University of Utah. In the past decade, applications of the polygraph and research on physiological deception detection have grown rapidly worldwide.
Standard Physiological Measures in Modern Polygraphs
Several companies around the world manufacture polygraph instruments for use in the field as credibility assessment devices. A typical field polygraph instrument takes measures of respiration, blood pressure, and the electrodermal response. Respiration is measured from stretch sensors placed over the upper chest and abdomen. A continuous measure of relative blood pressure is monitored from an inflated cuff on the upper arm. Electrodermal activity (galvanic skin response) is recorded from the palmar surface of the hand. Some polygraph instruments in current field use also measure the peripheral vasomotor response (blood flow near the surface of the skin), usually from the thumb. Most of the instruments in current field use are digital computer-based systems. Currently, there is little controversy concerning the ability of field polygraph instruments to adequately measure the physiological values they claim to measure.
During the past decade, the U.S. government has invested in a number of research projects in an effort to find new dependent physiological measures for credibility assessment. Electroencephalograms, neural imaging, thermal imaging, eye movement, and others have been examined, but as yet, none of these new measures have achieved sufficient scientific basis for use in application.
The Knowledge Approach
In the knowledge approach, which is often referred to as the Guilty Knowledge Test or, more correctly, as the Concealed Knowledge Test (CKT), the subject is presented with a series of items in multiple-choice format. The items are designed to represent some bit of knowledge that the truthful person would not know. For example, John Doe is murdered with a pistol that the police have determined to be a .380 automatic. The media reports indicated that the victim was shot to death, but the police never publicized the exact type of weapon. A suspect might be asked, If you shot John Doe, you would know the type of weapon: Was the gun used to shoot John Doe a .38 special revolver? A .45-caliber automatic? A .357 Magnum? A .380 automatic? A 9mm automatic? A .22-caliber revolver? A window of approximately 20 seconds would follow each alternative to allow for the suspect’s autonomic responses to take place and then recover. Because people tend to produce physiological responses to the first item in any series, the critical item is never placed in the first position, and the first item in the CKT series is never evaluated.
It is assumed that on recognition of the correct alternative, the deceptive person will generate autonomic responses. It is also assumed that the truthful person will have no reason to produce a specific response to the critical item and will thus be producing nonspecific responses at random. Thus, the likelihood that an innocent person would produce his or her largest response to the critical item in a 1-item CKT is 0.20. With a 2-item CKT, the likelihood that an innocent person would give his or her largest response to the critical item on both CKT series is 0.20 x 0.20, or 0.04. As the number of items increases, the likelihood of making a false-positive error (a truthful person appearing deceptive) is thus definable and rapidly becomes quite small.
Strengths of the CKT
The CKT has two principle strengths. First, it is possible to precisely define the likelihood of making a false-positive error and to control that error rate by the number of items used in the CKT. Research has consistently shown that the statistical prediction holds well in application. It is also possible to pretest the transparency of the items in a CKT by presenting the series of items and alternatives to persons known to be truthful regarding knowledge of the crime. Transparency refers to the ability of innocent persons to guess the critical item from a series. While testing for transparency of items is common in research settings, it is not known whether or not it is a common practice in the field.
The second great strength of the CKT is that it is a very simple test to administer. With a few hours of training with the equipment, an undergraduate research assistant can administer the CKT as well as an experienced polygraph examiner. There are examples in the literature of the CKT being completely automated for machine administration.
Weaknesses of the CKT
The CKT has three primary weaknesses. The first, known as memorability, concerns the fact that for the CKT to work, the deceptive person must remember the details of the crime. In that regard, the extensive research on eyewitness memory indicates that eyewitnesses, particularly those under stress, are prone to make mistakes in recounting the details of a crime. The perpetrator of a crime is an eyewitness to that crime, and it is likely that the perpetrator will be a highly stressed eyewitness of the crime. Moreover, many perpetrators are also intoxicated. To date, there is no theory to predict what specific details from a crime scene are likely to be remembered. The memorability problem is avoided in most laboratory research on the CKT by screening a number of details of the crime scene with pilot subjects and then using only the highly memorable items in the subsequent testing or by using overlearned items of personal history. Such a screening of items is not possible in real cases. Laboratory research on the CKT has revealed a slight tendency toward false-negative errors—that is, toward deceptive individuals appearing truthful on the test. However, the few existing field studies of the CKT suggest that the false-negative rate in the field may be as high as 50%.
The second major weakness of the CKT is one of applicability. Research conducted by the Federal Bureau of Investigation in the United States found that fewer than 10% of their cases were amenable to the use of the CKT, if they had wanted to use it. In nearly 90% of the FBI case files examined, the nature of the case was such that there were not enough items of concealed knowledge to conduct a CKT.
The third weakness of the CKT concerns counter-measures—that is, things that a deceptive person might do in an effort to defeat or distort the test. Research shows the CKT to be susceptible to mental and physical countermeasures if subjects are knowledgeable about the technique and have received training in the use of countermeasures.
Application of the CKT
Although a great deal is written about the CKT in the scientific literature, it presently has very little application in either law enforcement or national security. There is essentially no application of the CKT in the United States. The only country that reports a general use of the CKT in law enforcement is Japan. In Japan, persons with special training in psychology and eyewitness memory are part of the crime scene investigation team, and they actively search for and document possible bits of information for use in CKT when the crime scene is first investigated. It may be that this careful crime scene documentation results in a higher rate of applicability for the technique. However, a clear explanation of how Japanese examiners overcome the memorability problem is not presently in evidence.
The Deception Approach
The Relevant-Irrelevant Test
The deception approach asks direct accusatory questions (referred to as Relevant questions) under the assumption that persons attempting deception will produce physiological responses when they lie. The earliest version of the deception approach was the Relevant-Irrelevant Test (RIT). Along with direct accusatory questions (e.g., Did you shoot John Doe?), the RIT also asks irrelevant (neutral) questions, to which the person is assumed to be responding truthfully (e.g., Are the lights on in this room?). The working assumption of the RIT is that persons attempting deception will produce a large and consistent physiological response to the relevant questions, whereas the truthful will not distinguish between the irrelevant and the relevant questions.
Virtually all the scientists who work in this area dismiss the working assumptions of the RIT as naive. Clearly the truthful will recognize the relevant questions as the more important class of stimuli and are thus likely to produce physiological responses to them, and in fact, research does show a very large number of false-positive outcomes to the RIT. As a result, the RIT has very little application in forensic polygraph testing. However, the RIT is still in use for periodic screening of sex offenders and in screening job applicants. At this time, any use of the RIT is highly controversial, and the scientists active in this area do not support its use.
Comparison Question Tests
John Reid developed the notion of an active comparison question in the context of law enforcement examinations during the late 1940s in response to the obvious problems with the RIT. The idea of the active comparison question was to provide a stimulus in the test that would evoke physiological responses from the innocent but not from the guilty. The comparison question took the form of a question that the subject was probably going to respond to with a lie. For example, after discussing the death of John Doe and after the subject of the examination has denied being involved in John Doe’s death, the polygraph examiner would tell the subject that he or she is going to be asked some questions about his or her basic character in an effort to show that he or she is not the type of person who would have shot John Doe. The subject would then be asked a question such as “Before the year 2006, did you ever hurt someone?” The comparison question is deliberately vague and covers a long period of time. In the context of the examination, the subject is led to believe that an affirmative response is damaging because it shows that he or she is the kind of person who would have committed the crime. However, for virtually all subjects, it can be assumed that a definitive “No” response is probably a lie in view of the deliberately vague presentation of the comparison question.
The working assumption of the Comparison Question Test (CQT) is that guilty participants will produce consistent physiological responses to the relevant questions, while they will respond only minimally to the comparison questions. Although the guilty are assumed to be lying in their answers to the comparison questions, it is assumed that the comparisons are likely to be viewed as unimportant compared with the relevant questions, which directly address the issues under investigation. The innocent are expected to respond more to the comparison questions because they know that they are lying or are at least uncertain about the veracity of their answers to the comparison questions, whereas they know they are responding with the truth to the relevant questions. Thus, differential reactivity is expected from the innocent and the guilty. Guilty subjects should produce consistently greater physiological responses to the relevant questions than to the comparison questions, and innocent subjects should produce consistently greater physiological responses to the comparison questions than to the relevant questions. If differential reactivity is not observed—that is, no response to either question type or equal response to both question types, the test is considered to be inconclusive.
In application, a CQT will contain between two and four relevant questions and a similar number of comparison questions. The question series will also contain some neutral and other questions that are not used directly for credibility assessment. The questions will be repeated a minimum of three times, but more presentations may be obtained. The resultant data are evaluated by making systematic comparisons between the responses to relevant questions and contiguous comparison questions. The standard in application is a human-based system that is semi-objective in that it is rule based, and in some physiological response systems, actual objective measures of physiological response are made (e.g., the electrodermal response), but in other response systems, human judgment is involved in making evaluations (e.g., the respiratory responses). Currently, there are three human-based scoring systems in use in the field, and persons trained in those systems show high levels of reliability in their total scores. Reliability coefficients for total scores are usually .9 or better.
Validity studies of the CQT have produced a range of estimates. However, current meta-analyses seem to be converging on a validity estimate for the CQT of near 90% accuracy for decisions (i.e., excluding approximately 8% of the tests that are inconclusive.) That said, there is controversy in the literature concerning the appropriate methodology for both laboratory and field studies in this area and about the generalizability of currently obtained results. By manipulating those studies that one views as having adequate methodology, the estimate of the validity of the CQT can be increased or decreased in reference to the figure mentioned above.
Strengths of the CQT
The great strength of the CQT is its wide applicability. The CQT is a highly versatile technique that can be applied to most credibility assessment situations. If unambiguous relevant questions can be formulated, then the applicability of the CQT would seem to be limited only by the subject’s mental competence. In the laboratory and in many field studies, the CQT has been shown to be capable of a high level of accuracy.
Weakness of the CQT
The CQT is criticized at a number of levels. At one level, the CQT is criticized because it lacks a well-developed theory of underlying processes to explain why it works. Certainly, the lack of theory has hampered basic research in this area. An articulated theory would be useful in guiding research and in predicting conditions of generalizability of research results. However, the CQT polygraph is not unique in being a technology in successful widespread application without complete understanding of the underlying processes. Aspirin was in widespread use as a fever reducer and pain reliever for over 100 years before a complete explanation of its mechanisms was forthcoming.
A more telling criticism of the application of the CQT, particularly in the United States, concerns a lack of professional standards and regulation. Polygraph testing in the United States is controlled by a patchwork of standards and state licensing regulations. In many states, there is no regulation at all. As a result, the quality of practice in the polygraph profession in the United States is highly variable. Worldwide, this may not be the case. In Israel and Japan, psychologists are heavily involved in polygraph programs. In the People’s Republic of China, the government polygraph program is organized within the Chinese National Academy of Sciences. One positive development concerning standards is that the American Association of Testing and Materials International (ASTM) has recently formed committees and is promulgating consensus standards for the administration of polygraph tests and for the training of polygraph examiners.
It has been suggested that some police agencies in the United States use the polygraph primarily as an interrogation prop to aid in obtaining confessions. Anecdotally, there are several well-known exoneration cases that have involved polygraph examinations as part of the process leading to false confessions. This is a topic clearly in need of additional research.
Finally, the CQT shares with the CKT a similar weakness regarding countermeasures. Some knowledgeable subjects can use mental and physical countermeasures to produce false-negative outcomes in laboratory settings. As with the CKT, we do not know how successful countermeasure attempts are against the CQT in the field.
Application of the CQT
The CQT is in widespread application around the world as an investigative tool, as a screening tool for national security, and in the monitoring and treatment of sexual offenders on their release from incarceration. In some jurisdictions, the results of polygraph examinations are used as evidence in courts of law. The use of the polygraph in postconviction mitigation and sentencing seems to be growing. Although the controversy in the scientific literature remains, the use of the CQT worldwide seems to be accelerating.
- Edelman, J. (2005). Admissibility of polygraph (lie detector) examinations. Criminal Law Journal, 29, 21-36.
- Faigman, D. L., Kaye, D., Saks, M. J., & Sanders, J. (Eds.). (2005). Polygraph tests. In Modern scientific evidence: The law and science of expert testimony (Vol. 4. Forensics 2005-2006 ed., chap. 40, pp. 547-655). Eagan, MN: Thompson West.
- Grubin, D. (2006). Accuracy and utility of post-conviction polygraph testing of sex offenders. British Journal of Psychiatry, 188, 479-183.
- Honts, C. R. (2004). The psychophysiological detection of deception. In P. Granhag & L. Stromwall (Eds.), Detection of deception in forensic contexts (pp. 103-123). London: Cambridge University Press.
- Kleiner, M. (2002). Handbook of polygraph testing. New York: Academic Press.
- National Research Council. (2003). The polygraph and lie detection. Washington, DC: National Academies Press.
- Use of Evidence in Detection of Deception
- Detection of Deception in Adults
- False Confessions
- Public Opinion about the Polygraph
- Sex Offender Treatment