David Myers

On the Reliability and Teaching of Psychological Science

Blog Post created by David Myers Expert on Jun 21, 2018

“The most famous psychological studies are often wrong, fraudulent, or outdated.” With this headline, Vox joins critics that question the reproducibility and integrity of psychological science’s findings.


Are many psychology findings indeed untrustworthy? In 2008, news from a mass replication study—that only 36 percent of nearly 100 psychological science studies successfully reproduced the previous findings—rattled our field. Some challenged the conclusion: “Our analysis completely invalidates the pessimistic conclusions that many have drawn from this landmark study,” said Harvard psychologist Daniel Gilbert.


For introductory psychology teachers, those supposed failures to replicate need not have been a huge concern. Introductory psych textbooks focus on major, well-established findings and ideas. (For example, only one of the 60+ unreplicated studies were among the 5,174 references in my text at the time, necessitating a deletion of only one-half sentence in its next edition.)


But here are more recent criticisms—about six famous and favorite studies:

  • Philip Zimbardo stage-managed the Stanford prison study to get his wished-for results, and those who volunteer for such an experiment may be atypically aggressive and authoritarian (see here and here). Moreover, as Stephen Reicher and Alex Haslaam showed, when they recreated a prison experiment with the BBC (albeit as reality TV rather than a replication), groups don’t necessarily corrupt—people can collectively choose to behave in varied ways. For such reasons, the Stanford prison study may in the future disappear from more intro psych texts. But for the present, some teachers still use this study as a vivid illustration of the potential corrupting power of evil situations. (Moreover, Philip Zimbardo and colleagues have released responses here.)
  • Muzafer Sherif similarly managed his famed boys’ camp study of conflict and cooperation to produce desired results (see here). Yet my friend Stephen Reicher, whom I met over coffee in St. Andrews two weeks ago, still considers the Sherif study a demonstration (even if somewhat staged) of the toxicity of competition and the benefits of cooperation.
  • The facial-feedback effect—the tendency of facial muscles to trigger associated feelings—doesn’t replicate (see here). The failure to reproduce that favorite study (which my students and I have experienced by holding a pencil with our teeth vs. our pouting lips) wiped a smile off my face. But then the original researcher, Fritz Strack, pointed us to 20 successful replications. And a new study sleuths a crucial difference (self-awareness effects due to camera proximity) between the studies that do and don’t reproduce the facial feedback phenomenon. Even without a pencil in my mouth, I am smiling again.
  • The ego-depletion effect—that self-control is like a muscle (weakened by exercise, replenished with rest, and strengthened with exercise)—also failed a multi-lab replication (here). But a massive new 40-lab study, with data analyzed by an independent consultant—“innovative, rigorous” science, said Center for Open Science founder Brian Nosek—did show evidence of a small depletion phenomenon.
  • Kitty Genovese wasn’t actually murdered in front of 38 apartment bystanders who were all nonresponsive (see here). Indeed. Nevertheless, the unresponsive bystander narrative—initiated by police after the Genovese murder—inspired important experiments on the conditions under which bystanders will notice and respond in crisis situations.
  • Mischel’s marshmallow study (children who delay gratification enjoy future success) got roasted by a big new failure to replicate. As I explain in last week’s www.TalkPsych.com essay, the researchers did find an association between 4½-year-olds’ ability to delay gratification and later school achievement, but it was modest and related to other factors. The take-home lesson: Psychological research does not show that a single act of behavior is a reliable predictor of a child’s life trajectory. Yet life success does grow from impulse restraint. When deciding whether to study or party, whether to spend now or save for retirement, foregoing small pleasures can lead to bigger rewards later.


One positive outcome of these challenges to psychological science has been new scientific reporting standards that enable replications, along with the establishment of the Center for Open Science that aims to increase scientific openness, integrity, and reproducibility. (I was pleased recently to recommend to fellow Templeton World Charity Foundation trustees a multi-million dollar grant which will support the Center’s mission.)


The big picture: Regardless of findings, research replications are part of good science. Science, like mountain climbing, is a process that leads us upward, but with times of feeling like we have lost ground along the way. Any single study provides initial evidence, which can inspire follow-up research that enables us to refine a phenomenon and to understand its scope. Through replication—by winnowing the chaff and refining the wheat—psychological science marches forward.