A recent paper makes an upsetting claim about the state of science: nonreplicable studies are cited more often than replicable ones. In other words, according to the report in Science Advances, bad science seems to get more attention than good science.
The paper follows up on reports of a “replication crisis” in psychology, wherein large numbers of academic papers present results that other researchers are unable to reproduce—as well as claims that the problem is not limited to psychology. This matters for several reasons. If a substantial proportion of science fails to meet the norm of replicability, then this work won’t provide a solid basis for decision-making. Failure to replicate results may delay the use of science in developing new medicines and technologies. It may also undermine public trust, making it harder to get Americans vaccinated or to act on climate change. And money spent on invalid science is money wasted: one study puts the cost of irreproducible medical research in the U.S. alone at $28 billion a year.
In the new study, the authors tracked papers in psychology journals, economics journals, and Science and Nature with documented failures of replication. The results are disturbing: papers that couldn’t be replicated were cited more than average, even after the news of the reproducibility failure had been published, and only 12 percent of postexposure citations acknowledged the failure.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
These results parallel those of a 2018 study. An analysis of 126,000 rumor cascades on Twitter showed that false news spread faster and reached more people than verified true claims. It also found that robots propagated true and false news in equal proportions: it was people, not bots, who were responsible for the disproportionate spread of falsehoods online.
A potential explanation for these findings involves a two-edged sword. Academics valorize novelty: new findings, new results, “cutting-edge” and “disruptive” research. On one level this makes sense. If science is a process of discovery, then papers that offer new and surprising things are more likely to represent a possible big advance than papers that strengthen the foundations of existing knowledge or modestly extend its domain of applicability. Moreover, both academics and laypeople experience surprises as more interesting (and certainly more entertaining) than the predictable, the normal and the quotidian. No editor wants to be the one who rejects a paper that later becomes the basis of a Nobel Prize. The problem is that surprising results are surprising because they go against what experience has led us to believe so far, which means that there’s a good chance they’re wrong.
The authors of the citation study theorize that reviewers and editors apply lower standards to “showy” or dramatic papers than to those that incrementally advance the field and that highly interesting papers attract more attention, discussion and citations. In other words, there is a bias in favor of novelty. The authors of the Twitter study also point to novelty as a culprit: they found that the false news that spread rapidly online was significantly more unusual than the true news.
Novel claims have the potential to be very valuable. If something surprises us, it indicates that we might have something to learn from it. The operative word here is “might” because this premise presupposes that the surprising thing is at least partly true. But sometimes things are surprising and wrong. All of which indicates that researchers, reviewers and editors should take steps to correct their bias in favor of novelty, and suggestions have been put forward for how to do this.
There is another problem. As the authors of the citation study note, many replication studies focus on splashy papers that have received a lot of attention. But these are more likely than average to fail to hold up on further scrutiny. A review focused on showy, high-profile papers is not going to be reflective of science at large—a failure of the norm of representativeness. In one case that I have discussed elsewhere, a paper flagging reproducibility problems failed to reveal the researchers’ own methods, yet this paper has been—yes—highly cited. So scientists must be careful that in their quest to flag papers that couldn’t be replicated, they don’t create flashy but flimsy claims of their own.