Replication crisis
The replication crisis (or replicability crisis) refers to a methodological crisis in science, in which scientists have found that the results of many scientific experiments are difficult or impossible to replicate on subsequent investigation, either by independent researchers or by the original researchers themselves.[1] Since the reproducibility of experiments is an essential part of the scientific method, this has potentially grave consequences for many fields of science in which significant theories are grounded on experimental work which has now been found to be resistant to replication.
The replication crisis has been particularly widely discussed in the field of psychology (and in particular, social psychology) and in medicine, where a number of efforts have been made to re-investigate classic results, and to attempt to determine both the validity of the results, and, if invalid, the reasons for the failure of replication.[2][3] Whether similar replicability crises affect other disciplines is not clear, as other disciplines have been less proactive in investigation.
In psychology
Replication failures are not unique to psychology and are found in all fields of science.[4] However, several factors have combined to put psychology at the center of controversy. Much of the focus has been on the area of social psychology, although other areas of psychology such as clinical psychology have also been implicated.
Firstly, questionable research practices (QRPs) have been identified as common in the field.[5] Such practices, while not intentionally fraudulent, involve capitalizing on the gray area of acceptable scientific practices or exploiting flexibility in data collection, analysis, and reporting, often in an effort to obtain a desired outcome. Examples of QRPs include selective reporting or partial publication of data (reporting only some of the study conditions or collected dependent measures in a publication), optional stopping (choosing when to stop data collection based, often based on statistical significance of tests), p-value rounding (rounding p-values down to .05 to suggest statistical significance), file drawer effect (nonpublication of data), post-hoc storytelling (framing exploratory analyses as confirmatory analyses), and manipulation of outliers (either removing outliers or leaving outliers in a dataset to cause a statistical test to be significant).[5][6][7][8] A survey of over 2,000 psychologists indicated that a majority respondents admitted to using at least one QPR.[5] False positive conclusions, often resulting from the pressure to publish or the author's own confirmation bias, are an inherent hazard in the field, requiring a certain degree of skepticism on the part of readers.[9]
Secondly, psychology and social psychology in particular, has found itself at the center of several scandals involving outright fraudulent research, most notably the admitted data fabrication by Diederik Stapel[10] as well as allegations against others. However, most scholars acknowledge that fraud is, perhaps, the lesser contribution to replication crises.
Third, several effects in psychological science have been found to be difficult to replicate even before the current replication crisis. For example the scientific journal Judgment and Decision Making has published several studies over the years that fail to provide support for the unconscious thought theory. Replications appear particularly difficult when research trials are pre-registered and conducted by research groups not highly invested in the theory under questioning.
These three elements together have resulted in renewed attention for replication supported by Kahneman.[11] Scrutiny of many effects have shown that several core beliefs are hard to replicate. A recent special edition of the journal Social Psychology focused on replication studies and a number of previously held beliefs were found to be difficult to replicate.[12] A 2012 special edition of the journal Perspectives on Psychological Science also focused on issues ranging from publication bias to null-aversion that contribute to the replication crises in psychology[13] In 2015, the first open empirical study of reproducibility in Psychology was published, called the Reproducibility Project. Researchers from around the world collaborated to replicate 100 empirical studies from three top Psychology journals. Fewer than half of the attempted replications were successful at producing statistically significant results in the expected directions, though most of the attempted replications did produce trends in the expected directions.[14]
Scholar James Coyne has recently written that many research trials and meta-analyses are compromised by poor quality and conflicts of interest that involve both authors and professional advocacy organizations, resulting in many false positives regarding the effectiveness of certain types of psychotherapy.[15]
The replication crisis does not mean that psychology is unscientific.[16][17][18] Rather this process is a healthy if sometimes acrimonious part of the scientific process in which old ideas or those that cannot withstand careful scrutiny are pruned,[19][20] although this pruning process is not always effective.[21][22] The consequence is that some areas of psychology once considered solid, such as social priming, have come under increased scrutiny due to failed replications.[23] The British Independent newspaper wrote that the results of the reproducibility project show that much of the published research is just "psycho-babble".[24]
Nobel laureate and professor emiritus in psychology Daniel Kahneman argued that the original authors should be involved in the replication effort because the published methods are often too vague.[25] Some others scientists, like Dr. Andrew Wilson disagree and argue that the methods should be written down in detail. An investigation of replication rates in psychology in 2012 indicated higher success rates of replication in replication studies when there was author overlap with the original authors of a study [26] (91.7% successful replication rates in studies with author overlap compared to 64.6% success replication rates without author overlap).
Replication rates in psychology
A report by the Open Science Collaboration in August 2015 that was coordinated by Brian Nosek estimated the reproducibility of 100 studies in psychological science from three high-ranking psychology journals.[27] Overall, 36% of the replications yielded significant findings (p value below .05) compared to 97% of the original studies that had significant effects. The mean effect size in in the replications was approximately half the magnitude of the effects reported in the original studies.
The same paper examined the reproducibility rates and effect sizes by journal (Journal of Personality and Social Psychology [JPSP], Journal of Experimental Psychology: Learning, Memory, and Cognition [JEP:LMC], Psychological Science [PSCI]) and discipline (social psychology, cognitive psychology). Study replication rates were 23% for JPSP, 38% for JEP:LMC, and 38% for PSCI. Studies in the field of cognitive psychology had a higher replication rate (50%) than studies in in the field of social psychology (25%).
An analysis of the publication history in the top 100 psychology journals between 1900 and 2012 indicated that approximately 1.6% of all psychology publications were replication attempts.[26] Articles were considered a replication attempt if the term "replication" appeared in the text. A subset of those studies (500 studies) was randomly selected for further examination and yielded a lower replication rate of 1.07% (342 of the 500 studies [68.4%] were actually replications). In the subset of 500 studies, analysis indicated that 78.9% of published replication attempts were successful. The rate of successful replication was significantly higher when at least one author of the original study was part of the replication attempt (91.7% relative to 64.6%).
A disciplinary social dilemma
Highlighting the social structure that discourages replication in psychology, Brian D. Earp and Jim A. C. Everett enumerated five points as to why replication attempts are uncommon[28][29]
- "Independent, direct replications of others’ findings can be time-consuming for the replicating researcher
- "[Replications] are likely to take energy and resources directly away from other projects that reflect one’s own original thinking
- "[Replications] are generally harder to publish (in large part because they are viewed as being unoriginal
- "Even if [replications] are published, they are likely to be seen as 'bricklaying' exercises, rather than as major contributions to the field
- "[Replications] bring less recognition and reward, and even basic career security, to their authors"[30]
For these reasons the authors advocated that psychology is facing a disciplinary social dilemma, where the interests of the discipline is at odds with the interest of the individual researcher.
Addressing the replication crisis
Replication has been referred to as "the cornerstone of science".[31][32] Replication studies attempt to evaluate whether published results reflect true findings or false positives. The integrity of scientific findings and reproducibility of research are important as they form the knowledge foundation on which future studies are built.
- A recent innovation in scientific publishing to address the replication crisis is through the use of registered reports.[33][34] The registered report format requires authors to submit a description of the study methods and analyses prior to data collection. Once the method and analysis plan is vetted through peer-review, publication of the findings is provisionally guaranteed, based on whether the authors follow the proposed protocol. One goal of registered reports is to circumvent the publication bias toward significant findings that can lead to implementation of QRPs and to encourage publication of studies with rigorous methods.
- Based on coursework in experimental methods at MIT and Stanford, it has been suggested that methods courses in psychology emphasize replication attempts rather than original studies.[35][36] Such an approach would help students learn scientific methodology and provide numerous independent replications of meaningful scientific findings that would test the replicability of scientific findings. Some have recommended that graduate students should be required to publish a high-quality replication attempt on a topic related to their doctoral research prior to graduation.[29]
- To improve the quality of replications, larger sample sizes than those used in the original study are often needed.[37] Larger sample sizes are needed because estimates of effect sizes in published work are often exaggerated due to publication bias and large sampling variability associated with small sample sizes in an original study.[38][39][39][40]
- Online repositories where data, protocols, and findings can be stored and evaluated by the public seek to improve the integrity and reproducibility of research. Examples of such repositories include the open science framework, http://www.re3data.org/, and www.psychfiledrawer.org. Sites like Open Science Framework offer badges for using open science practices in an effort to incentivize scientists. However, there has been concern that those who are most likely to provide their data and code for analyses are the researchers that are likely the most sophisticated.[41] John Ioannidis at Stanford University suggested that "the paradox may arise that the most meticulous and sophisticated and method-savvy and careful researchers may become more susceptible to criticism and reputation attacks by reanalyzers who hunt for errors, no matter how negligible these errors are."[41]
- The journal Psychological Science has encouraged the preregistration of studies and the reporting of effect sizes and confidence intervals.[42] The editor in chief also noted that the editorial staff will be asking for replication of studies with surprising findings from examinations using small sample sizes before allowing the manuscripts to be published.
Quotes
- By Diederik Stapel From the authorized english translation by Nicholas J.L. Brown available as a free download in PDF format
- Clearly, there was something in the recipe for the X effect that I was missing. But what? I decided to ask the experts, the people who’d found the X effect and published lots of articles about it [..] My colleagues from around the world sent me piles of instructions, questionnaires, papers, and software [..] In most of the packages there was a letter, or sometimes a yellow Post-It note stuck to the bundle of documents, with extra instructions: “Don’t do this test on a computer. We tried that and it doesn’t work. It only works if you use pencil-and-paper forms.” “This experiment only works if you use ‘friendly’ or ‘nice’. It doesn’t work with ‘cool’ or ‘pleasant’ or ‘fine’. I don’t know why.” “After they’ve read the newspaper article, give the participants something else to do for three minutes. No more, no less. Three minutes, otherwise it doesn’t work.” “This questionnaire only works if you administer it to groups of three to five people. No more than that.” I certainly hadn’t encountered these kinds of instructions and warnings in the articles and research reports that I’d been reading. This advice was informal, almost under-the-counter, but it seemed to be a necessary part of developing a successful experiment. Had all the effect X researchers deliberately omitted this sort of detail when they wrote up their work for publication? I don’t know.
- From his memoirs: "Ontsporing" (English, "Derailment") Nov. 2012
- Clearly, there was something in the recipe for the X effect that I was missing. But what? I decided to ask the experts, the people who’d found the X effect and published lots of articles about it [..] My colleagues from around the world sent me piles of instructions, questionnaires, papers, and software [..] In most of the packages there was a letter, or sometimes a yellow Post-It note stuck to the bundle of documents, with extra instructions: “Don’t do this test on a computer. We tried that and it doesn’t work. It only works if you use pencil-and-paper forms.” “This experiment only works if you use ‘friendly’ or ‘nice’. It doesn’t work with ‘cool’ or ‘pleasant’ or ‘fine’. I don’t know why.” “After they’ve read the newspaper article, give the participants something else to do for three minutes. No more, no less. Three minutes, otherwise it doesn’t work.” “This questionnaire only works if you administer it to groups of three to five people. No more than that.” I certainly hadn’t encountered these kinds of instructions and warnings in the articles and research reports that I’d been reading. This advice was informal, almost under-the-counter, but it seemed to be a necessary part of developing a successful experiment. Had all the effect X researchers deliberately omitted this sort of detail when they wrote up their work for publication? I don’t know.
See also
References
- ↑ Schooler, J. W. (2014). "Metascience could rescue the 'replication crisis'". Nature 515 (7525): 9. doi:10.1038/515009a.
- ↑ Gary Marcus (May 1, 2013). "The Crisis in Social Psychology That Isn’t". The New Yorker.
- ↑ Jonah Lehrer (December 13, 2010). "The Truth Wears Off". The New Yorker.
- ↑ Achenbach, Joel. "No, science’s reproducibility problem is not limited to psychology". The Washington Post. Retrieved 10 September 2015.
- 1 2 3 John, Leslie K.; Loewenstein, George; Prelec, Drazen (2012-05-01). "Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling". Psychological Science 23 (5): 524–532. doi:10.1177/0956797611430953. ISSN 0956-7976. PMID 22508865.
- ↑ "The Nine Circles of Scientific Hell". Perspectives on Psychological Science 7 (6): 643–644. 2012-11-01. doi:10.1177/1745691612459519. ISSN 1745-6916. PMID 26168124.
- ↑ "Research misconduct - The grey area of Questionable Research Practices". www.vib.be. Retrieved 2015-11-13.
- ↑ Fiedler, Klaus; Schwarz, Norbert (2015-10-19). "Questionable Research Practices Revisited". Social Psychological and Personality Science 7: 1948550615612150. doi:10.1177/1948550615612150. ISSN 1948-5506.
- ↑ Simmons, Joseph; Nelson, Leif; Simonsohn, Uri (November 2011). "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant". Psychological Science (Washington DC: Association for Psychological Science) 22 (11): 1359–1366. doi:10.1177/0956797611417632. ISSN 0956-7976. PMID 22006061. Retrieved 29 January 2012.
- ↑ "Fraud Scandal Fuels Debate Over Practices of Social Psychology: Even legitimate researchers cut corners, some admit"
- ↑ "A New Etiquette for Replication". http://www.scribd.com/doc/225285909/Kahneman-Commentary
- ↑
- ↑
- ↑ Open Science Collaboration (2015). "Estimating the reproducibility of Psychological Science". Science 349 (6251): aac4716. doi:10.1126/science.aac4716. PMID 26315443.
- ↑
- ↑ http://www.slate.com/articles/health_and_science/science/2014/07/replication_controversy_in_psychology_bullying_file_drawer_effect_blog_posts.single.html
- ↑ http://fivethirtyeight.com/datalab/psychology-is-starting-to-deal-with-its-replication-problem/
- ↑ http://fivethirtyeight.com/features/science-isnt-broken/
- ↑ "Psychology's replication drive: it's not about you"
- ↑ Wagenmakers, Eric-Jan; Wetzels, Ruud; Borsboom, Denny; Maas, Han L. J. van der; Kievit, Rogier A. (2012-11-01). "An Agenda for Purely Confirmatory Research". Perspectives on Psychological Science 7 (6): 632–638. doi:10.1177/1745691612463078. ISSN 1745-6916. PMID 26168122.
- ↑ Ioannidis, John P. A. (2012-11-01). "Why Science Is Not Necessarily Self-Correcting". Perspectives on Psychological Science 7 (6): 645–654. doi:10.1177/1745691612464056. ISSN 1745-6916. PMID 26168125.
- ↑ Pashler, Harold; Harris, Christine R. (2012-11-01). "Is the Replicability Crisis Overblown? Three Arguments Examined". Perspectives on Psychological Science 7 (6): 531–536. doi:10.1177/1745691612463401. ISSN 1745-6916. PMID 26168109.
- ↑ "Power of Suggestion"
- ↑ Connor, Steve (27 August 2015). "Study reveals that a lot of psychology research really is just 'psycho-babble'". The Independent (London).
- ↑ http://www.theguardian.com/science/head-quarters/2014/jun/10/physics-envy-do-hard-sciences-hold-the-solution-to-the-replication-crisis-in-psychology
- 1 2 Makel, Matthew C.; Plucker, Jonathan A.; Hegarty, Boyd (2012-11-01). "Replications in Psychology Research How Often Do They Really Occur?". Perspectives on Psychological Science 7 (6): 537–542. doi:10.1177/1745691612460688. ISSN 1745-6916. PMID 26168110.
- ↑ Collaboration, Open Science (2015-08-28). "Estimating the reproducibility of psychological". Science 349 (6251): aac4716. doi:10.1126/science.aac4716. ISSN 0036-8075. PMID 26315443.
- ↑ see also Earp and Trafimow, 2015
- 1 2 Everett, Jim Albert Charlton; Earp, Brian D. (2015-01-01). "A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers". Personality and Social Psychology 6: 1152. doi:10.3389/fpsyg.2015.01152. PMC 4527093. PMID 26300832.
- ↑ "Resolving the replication crisis in social psychology? A new proposal | SPSP". www.spsp.org. Retrieved 2015-11-18.
- ↑ Moonesinghe, Ramal; Khoury, Muin J; Janssens, A. Cecile J. W (2007-02-27). "Most Published Research Findings Are False—But a Little Replication Goes a Long Way". PLoS Med 4 (2): e28. doi:10.1371/journal.pmed.0040028. PMC 1808082. PMID 17326704.
- ↑ Simons, Daniel J. (2014-01-01). "The Value of Direct Replication". Perspectives on Psychological Science 9 (1): 76–80. doi:10.1177/1745691613514755. ISSN 1745-6916. PMID 26173243.
- ↑ "Registered Replication Reports - Association for Psychological Science". www.psychologicalscience.org. Retrieved 2015-11-13.
- ↑ Chambers, Chris. "Psychology’s ‘registration revolution’ | Chris Chambers". the Guardian. Retrieved 2015-11-13.
- ↑ Frank, Michael C.; Saxe, Rebecca (2012-11-01). "Teaching Replication". Perspectives on Psychological Science 7 (6): 600–604. doi:10.1177/1745691612460686. ISSN 1745-6916. PMID 26168118.
- ↑ Grahe, Jon E.; Reifman, Alan; Hermann, Anthony D.; Walker, Marie; Oleson, Kathryn C.; Nario-Redmond, Michelle; Wiebe, Richard P. (2012-11-01). "Harnessing the Undiscovered Resource of Student Research Projects". Perspectives on Psychological Science 7 (6): 605–607. doi:10.1177/1745691612459057. ISSN 1745-6916. PMID 26168119.
- ↑ Maxwell, Scott E.; Lau, Michael Y.; Howard, George S. "Is psychology suffering from a replication crisis? What does "failure to replicate" really mean?". American Psychologist 70 (6): 487–498. doi:10.1037/a0039400.
- ↑ IntHout, Joanna; Ioannidis, John P.A.; Borm, George F.; Goeman, Jelle J. "Small studies are more heterogeneous than large ones: a meta-meta-analysis". Journal of Clinical Epidemiology 68 (8): 860–869. doi:10.1016/j.jclinepi.2015.03.017.
- 1 2 Button, Katherine S.; Ioannidis, John P. A.; Mokrysz, Claire; Nosek, Brian A.; Flint, Jonathan; Robinson, Emma S. J.; Munafò, Marcus R. (2013-05-01). "Power failure: why small sample size undermines the reliability of neuroscience". Nature Reviews Neuroscience 14 (5): 365–376. doi:10.1038/nrn3475. ISSN 1471-003X.
- ↑ Greenwald, Anthony G. "Consequences of prejudice against the null hypothesis.". Psychological Bulletin 82 (1): 1–20. doi:10.1037/h0076157.
- 1 2 Ioannidis, John P.A. "Anticipating consequences of sharing raw data and code and of awarding badges for sharing". Journal of Clinical Epidemiology. doi:10.1016/j.jclinepi.2015.04.015.
- ↑ Lindsay, D. Stephen (2015-11-09). "Replication in Psychological Science". Psychological Science 26: 0956797615616374. doi:10.1177/0956797615616374. ISSN 0956-7976. PMID 26553013.
Further reading
- November 2012 special edition of Perspectives on Psychological Science on the issues of replicability and research practices in psychology
- Reproducibility in Science: Improving the Standard for Basic and Preclinical Research[1]
- Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science
- Why Most Published Research Findings Are False
- Bonett, DG (2012) Replication-extension studies. Current Directions in Psychology 21, 409-412.
- ↑ Begley, C. Glenn; Ioannidis, John P. A. (2015-01-02). "Reproducibility in Science Improving the Standard for Basic and Preclinical Research". Circulation Research 116 (1): 116–126. doi:10.1161/CIRCRESAHA.114.303819. ISSN 0009-7330. PMID 25552691.