I have a long commute to work everyday, and on those drives, I often listen to podcasts. One by the NPR Planet Money team struck a nerve recently. It was called “The Experiment Experiment” and was about reproducing results in the field of psychology. As an aside, physicist David Kestenbaum is one of the reporters on Planet Money that relayed this story.
In the episode, Brian Nosek, a psychologist at the University of Virginia is able to (amazingly!) persuade 270 others working in his field to try to reproduce 100 experiments that were published in the three top-tier psychology journals. The result?
64 of the 100 experiments were not reproduced.
The most interesting part of this podcast is the analysis as to why this occurred. Their hypothesis is that two main reasons are to blame:
- The “file-drawer effect”
- Psychologists tricking themselves due to misaligned incentives
Let’s use, as they did in the episode, the instructive analogy of coin-flipping to understand these two ideas. It goes something like this. If there are 100 researchers, doing the same experiment of flipping 10 coins, most of them will obtain somewhere between 4-6 heads. These researchers are likely to put their results in their drawers and move on to their next experiment. However, there will be 1 or 2 researchers that get the astounding result of 9 out of 10 or 1 out of 10 heads. These researchers will think to themselves, “Whoa! What is the probability of me obtaining that result?!” They will calculate it, see that the probability is about 1% and publish the result thinking that it is statistically significant.
Just as an illustration, here the distribution one would naively expect of 100 researchers doing the coin-flipping experiment:
The other component that they claim contributes to these striking reproducibility numbers is that researchers have an incentive to obtain positive results. This is because positive results get researchers publications, which result in promotions for tenure-track faculty and jobs for graduate students and postdocs. Therefore, due to the incentive structure, researchers have a natural bias towards positive results. This does not mean to imply that these researchers are committing scientific misconduct; they are just unaware of their biases.
Let us take the coin-flipping example again and start from the above graphic to see how this might work. Approximately 12 of 100 researchers flipping coins will obtain 7 heads out of 10 coin flips. This would not be a remarkably significant result, but then suppose all 12 of them think, “Let me just check to see if this result is true,” and they flip another 4 coins. Now, 3-4 of those 12 researchers will obtain 3 or 4 heads when flipping the coin 4 times, reinforcing their previous result! They will then think, “Well, this result must be true! I better publish this!”
One can see how these two effects could combine to lead to the staggering number of results that are not reproducible. Because the incentive structure in our field is similar, one fears that such things may be going on in physics departments as well. I would like to hope not, but if psychologists are susceptible to psychological pressure, who isn’t?