Experimenter's bias

In experimental science, experimenter's bias, also known as research bias, is a subjective bias towards a result expected by the human experimenter.[1] For example, it occurs when scientists unconsciously affect subjects in experiments.[2]

Observer-expectancy effect

The experimenter may introduce cognitive bias into a study in several ways. In what is called the observer-expectancy effect, the experimenter may subtly communicate their expectations for the outcome of the study to the participants, causing them to alter their behavior to conform to those expectations.

Clever Hans (in German, der Kluge Hans) was an Orlov Trotter horse that was claimed to have been able to perform arithmetic. As a result of the large amount of public interest in Clever Hans, Philosopher and psychologist Carl Stumpf, along with his assistant Oskar Pfungst, investigated von Osten's scientific claims.

Using a substantial number of trials, Pfungst found that the horse could get the correct answer even if von Osten himself did not ask the questions, ruling out the possibility of fraud. However, the horse got the right answer only when the questioner knew what the answer was, and the horse could see the questioner. He observed that when von Osten knew the answers to the questions, Hans got 89 percent of the answers correct, but when von Osten did not know the answers to the questions, Hans only answered six percent of the questions correctly.

Pfungst then proceeded to examine the behaviour of the questioner in detail, and showed that as the horse's taps approached the right answer, the questioner's posture and facial expression changed in ways that were consistent with an increase in tension, which was released when the horse made the final, correct tap. This provided a cue that the horse could use to tell it to stop tapping.

Evidence for the experimenter's bias has also been found among tests on humans. Two essentially alike groups were given the same task. This consisted of rating portrait pictures of people and estimating how successful they were on a scale of -10 to 10. However, in group a) the experimenters were told they could expect positive ratings. Conversely, in group b) the experimenters were told they could expect negative ratings. Group a) gave a much more optimistic appraisal than group b). According to those who instigated the experiment, it is explained by the fact that the researchers were giving subtle cues in the same manner von Osten had given his horse.[3]

Seven stages

In a review of biases in clinical studies, states that biases can occur in any one of seven stages of research:[1]

  1. in reading-up on the field,
  2. in specifying and selecting the study sample,
  3. in executing the experimental manoeuvre (or exposure),
  4. in measuring exposures and outcomes,
  5. in analyzing the data,
  6. in interpreting the analysis, and
  7. in publishing the results.

The inability of a human being to be objective is the ultimate source of this bias. It occurs more often in sociological and medical sciences, where double blind techniques are often employed to combat the bias. But experimenter's bias can also be found in some physical sciences, for instance, where the experimenter rounds off measurements.

Classification

Modern electronic or computerized data acquisition techniques have greatly reduced the likelihood of such bias, but it can still be introduced by a poorly designed analysis technique. Experimenter's bias was not well recognized until the 1950s and 60's, and then it was primarily in medical experiments and studies. Sackett (1979) catalogued 56 biases that can arise in sampling and measurement in clinical research, among the above-stated first six stages of research. These are as follows:

  1. In reading-up the field
    1. the biases of rhetoric
    2. the all's well literature bias
    3. one-sided reference bias
    4. positive results bias
    5. hot stuff bias
  2. In specifying and selecting the study sample
    1. popularity bias
    2. centripetal bias
    3. referral filter bias
    4. diagnostic access bias
    5. diagnostic suspicion bias
    6. unmasking (detection signal) bias
    7. mimicry bias
    8. previous opinion bias
    9. wrong sample size bias
    10. admission rate (Berkson) bias
    11. prevalence-incidence (Neyman) bias
    12. diagnostic vogue bias
    13. diagnostic purity bias
    14. procedure selection bias
    15. missing clinical data bias
    16. non-contemporaneous control bias
    17. starting time bias
    18. unacceptable disease bias
    19. migrator bias
    20. membership bias
    21. non-respondent bias
    22. volunteer bias
  3. In executing the experimental manoeuvre (or exposure)
    1. contamination bias
    2. withdrawal bias
    3. compliance bias
    4. therapeutic personality bias
    5. bogus control bias
  4. In measuring exposures and outcomes
    1. insensitive measure bias
    2. underlying cause bias (rumination bias)
    3. end-digit preference bias
    4. apprehension bias
    5. unacceptability bias
    6. obsequiousness bias
    7. expectation bias
    8. substitution game
    9. family information bias
    10. exposure suspicion bias
    11. recall bias
    12. attention bias
    13. instrument bias
  5. In analyzing the data
    1. post-hoc significance bias
    2. data dredging bias (looking for the pony)
    3. scale degradation bias
    4. tidying-up bias
    5. repeated peeks bias
  6. In interpreting the analysis
    1. mistaken identity bias
    2. cognitive dissonance bias
    3. magnitude bias
    4. significance bias
    5. correlation bias
    6. under-exhaustion bias

The effects of bias on experiments in the physical sciences have not always been fully recognized.

Prevention

In principle, if a measurement has a resolution of R, then if the experimenter averages N independent measurements the average will have a resolution of R/\sqrt{N} (this is the central limit theorem of statistics). This is an important experimental technique used to reduce the impact of randomness on an experiment's outcome. This requires that the measurements be statistically independent; there are several reasons why they may not be. If independence is not satisfied, then the average may not actually be a better statistic but may merely reflect the correlations among the individual measurements and their non-independent nature.

The most common cause of non-independence is systematic errors (errors affecting all measurements equally, causing the different measurements to be highly correlated, so the average is no better than any single measurement). Experimenter bias is another potential cause of non-independence.

In medical sciences

The complexity of living systems and the ethical impossibility of performing fully controlled experiments with certain species of animals and humans provide a rich, and difficult to control, source of experimental bias. The scientific knowledge about the phenomenon under study, and the systematic elimination of probable causes of bias, by detecting confounding factors, is the only way to isolate true cause-effect relationships. It is also in epidemiology that experimenter bias has been better studied than in other sciences.

A number of studies into Spiritual Healing illustrate how the design of the study can introduce experimenter bias into the results. A comparison of two studies illustrates that subtle differences in the design of the tests can adversely affect the results of one. The difference was due to the intended result: a positive or negative outcome rather than positive or neutral.

A 1995 paper[4] by Hodges & Scofield of spiritual healing used the growth rate of cress seeds as their independent variable in order to eliminate a placebo response or participant bias. The study reported positive results as the test results for each sample were consistent with the healers intention that healing should or should not occur. However the healer involved in the experiment was a personal acquaintance of the study authors raising the distinct possibility of experimenter bias. A randomized clinical trial,[5] published in 2001, investigated the efficacy of spiritual healing (both at a distance and face-to-face) on the treatment of chronic pain in 120 patients. Healers were observed by "simulated healers" who then mimicked the healers movements on a control group while silently counting backwards in fives - a neutral rather than should not heal intention. The study found a decrease in pain in all patient groups but "no statistically significant differences between healing and control groups ... it was concluded that a specific effect of face-to-face or distant healing on chronic pain could not be demonstrated."

In physical sciences

If the signal being measured is actually smaller than the rounding error and the data are over-averaged, a positive result for the measurement can be found in the data where none exists (i.e. a more precise experimental apparatus would conclusively show no such signal). If an experiment is searching for a sidereal variation of some measurement, and if the measurement is rounded-off by a human who knows the sidereal time of the measurement, and if hundreds of measurements are averaged to extract a "signal" which is smaller than the apparatus' actual resolution, then it should be clear that this "signal" can come from the non-random round-off, and not from the apparatus itself. In such cases a single-blind experimental protocol is required; if the human observer does not know the sidereal time of the measurements, then even though the round-off is non-random it cannot introduce a spurious sidereal variation.

In forensic sciences

Observer effects are rooted in the universal human tendency to interpret data in a manner consistent with one’s expectations.[6] This tendency is particularly likely to distort the results of a scientific test when the underlying data are ambiguous and the scientist is exposed to domain-irrelevant information that engages emotions or desires.[7] Despite impressions to the contrary, forensic DNA analysts often must resolve ambiguities, particularly when interpreting difficult evidence samples such as those that contain mixtures of DNA from two or more individuals, degraded or inhibited DNA, or limited quantities of DNA template. The full potential of forensic DNA testing can only be realized if observer effects are minimized.[8]

In social science

After the data are collected, bias may be introduced during data interpretation and analysis. For example, in deciding which variables to control in analysis, social scientists often face a trade-off between omitted-variable bias and post-treatment bias.[9]

See also

References

  1. 1.0 1.1 Sackett, D. L. (1979). "Bias in analytic research". Journal of Chronic Diseases 32 (1–2): 51–63. doi:10.1016/0021-9681(79)90012-2. PMID 447779.
  2. Barry H. Kantowitz; Henry L. Roediger, III; David G. Elmes (2009). Experimental Psychology. Cengage Learning. p. 371. ISBN 978-0-495-59533-5. Retrieved 7 September 2013.
  3. Rosenthal R. Experimenter effects in behavioral research. New York, NY: Appleton-Century-Crofts, 1966. 464 p.
  4. Hodges, RD and Scofield, AM (1995). "Is spiritual healing a valid and effective therapy?". Journal of the Royal Society of Medicine 88 (4): 203–207. PMC 1295164. PMID 7745566.
  5. Abbot, NC, Harkness, EF, Stevinson, C, Marshall, FP, Conn, DA and Ernst, E. (2001). "Spiritual healing as a therapy for chronic pain: a randomized, clinical trial". Pain 91 (1–2): 79–89. doi:10.1016/S0304-3959(00)00421-8. PMID 11240080.
  6. Rosenthal, R. (1966). Experimenter Effects in Behavioral Research. NY: Appleton-Century-Crofts.
  7. Risinger, D. M.; Saks, M. J.; Thompson, W. C.; Rosenthal, R. (2002). "The Daubert/Kumho Implications of Observer Effects in Forensic Science: Hidden Problems of Expectation and Suggestion". Calif. L. Rev. 90 (1): 1–56. doi:10.2307/3481305. JSTOR 3481305.
  8. D. Krane, S. Ford, J. Gilder, K. Inman, A. Jamieson, R. Koppl, I. Kornfield, D. Risinger, N. Rudin, M. Taylor, W.C. Thompson (2008). "Sequential unmasking: A means of minimizing observer effects in forensic DNA interpretation". Journal of Forensic Sciences 53 (4): 10061007. doi:10.1111/j.1556-4029.2008.00787.x. PMID 18638252.
  9. King, Gary. "Post-Treatment Bias in Big Social Science Questions", accessed February 7, 2011.