The natural selection of bad science

Written by Paul E. Smaldino, Richard McElreath

Abstract: Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science.

bad-science

This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery.

In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modelling. We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power.

To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more ‘progeny,’ such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level.

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

Donald T. Campbell (1976, p. 49) [1]

I’ve been on a number of search committees. I don’t remember anybody looking at anybody’s papers. Number and IF [impact factor] of pubs are what counts.

Terry McGlynn (realscientists) (21 October 2015, 4:12 p.m. Tweet.)

1. Introduction

In March 2016, the American Statistical Association published a set of corrective guidelines about the use and misuse of p-values [2]. Statisticians have been publishing guidelines of this kind for decades [3,4]. Beyond mere significance testing, research design in general has a history of shortcomings and repeated corrective guidelines. Yet misuse of statistical procedures and poor methods has persisted and possibly grown. In fields such as psychology, neuroscience and medicine, practices that increase false discoveries remain not only common, but normative [511].

Why have attempts to correct such errors so far failed? In April 2015, members of the UK’s science establishment attended a closed-door symposium on the reliability of biomedical research [12]. The symposium focused on the contemporary crisis of faith in research. Many prominent researchers believe that as much as half of the scientific literature—not only in medicine, by also in psychology and other fields—may be wrong [11,1315]. Fatal errors and retractions, especially of prominent publications, are increasing [1618]. The report that emerged from this symposium echoes the slogan of one anonymous attendee: ‘Poor methods get results.’ Persistent problems with scientific conduct have more to do with incentives than with pure misunderstandings. So fixing them has more to do with removing incentives that reward poor research methods than with issuing more guidelines. As Richard Horton, editor of The Lancet, put it: ‘Part of the problem is that no one is incentivised to be right’ [12].

This paper argues that some of the most powerful incentives in contemporary science actively encourage, reward and propagate poor research methods and abuse of statistical procedures. We term this process the natural selection of bad science to indicate that it requires no conscious strategizing nor cheating on the part of researchers. Instead, it arises from the positive selection of methods and habits that lead to publication. How can natural selection operate on research methodology? There are no research ‘genes’. But science is a cultural activity, and such activities change through evolutionary processes [1925]. Philosophers of science such as Campbell [19], Popper [26] and Hull [27] have discussed how scientific theories evolve by variation and selection retention. But scientific methods also develop in this way. Laboratory methods can propagate either directly, through the production of graduate students who go on to start their own labs, or indirectly, through prestige-biased adoption by researchers in other labs. Methods which are associated with greater success in academic careers will, other things being equal, tend to spread.

The requirements for natural selection to produce design are easy to satisfy. Darwin outlined the logic of natural selection as requiring three conditions:

  • (i) There must be variation.

  • (ii) That variation must have consequences for survival or reproduction.

  • (iii) Variation must be heritable.

In this case, there are no biological traits being passed from scientific mentors to apprentices. However, research practices do vary. That variation has consequences—habits that lead to publication lead to obtaining highly competitive research positions. And variation in practice is partly heritable, in the sense that apprentices acquire research habits and statistical procedures from mentors and peers. Researchers also acquire research practice from successful role models in their fields, even if they do not personally know them. Therefore, when researchers are rewarded primarily for publishing, then habits which promote publication are naturally selected. Unfortunately, such habits can directly undermine scientific progress.

This is not a new argument. But we attempt to substantially strengthen it. We support the argument both empirically and analytically. We first review evidence that institutional incentives are likely to increase the rate of false discoveries. Then we present evidence from a literature review of repeated calls for improved methodology, focusing on the commonplace and easily understood issue of statistical power. We show that despite over 50 years of reviews of low statistical power and its consequences, there has been no detectable increase.

While the empirical evidence is persuasive, it is not conclusive. It is equally important to demonstrate that our argument is logically sound. Therefore, we also analyse a formal model of our argument. Inspecting the logic of the selection-for-bad-science argument serves two purposes. First, if the argument cannot be made to work in theory, then it cannot be the correct explanation, whatever the status of the evidence. Second, formalizing the argument produces additional clarity and the opportunity to analyse and engineer interventions. To represent the argument, we define a dynamical model of research behaviour in a population of competing agents. We assume that all agents have the utmost integrity. They never cheat. Instead, research methodology varies and evolves due to its consequences on hiring and retention, primarily through successful publication. As a result our argument applies even when researchers do not directly respond to incentives for poor methods. We show that the persistence of poor research practice can be explained as the result of the natural selection of bad science.

Read the full paper at rsos.royalsocietypublishing.org