Spoiled Science: Cornell’s Famous Food Lab Exposed
Written by Tom Bartlett
Brian Wansink is nowhere to be found. He’s not in his office. Calls to his cellphone go to voicemail. He was supposed to meet me that morning at Cornell University’s Food and Brand Lab, which he created and directs, but he canceled the night before. Cornell’s media-relations staff is apologetic and accommodating: What about a meeting with the dean instead? Or a tour of campus? The architecture is amazing — and those gorges!
Normally, Wansink is only too happy to talk. The Sherlock Holmes of Food, as he’s sometimes called, made his name over the last two decades by persuading people to care about his research on eating habits. He’s popped up on Oprah, 60 Minutes, and Rachael Ray, among other shows. He’s the author of the best seller Mindless Eating: Why We Eat More Than We Think (Bantam, 2006). Wansink is affable, quotable, and good on TV. It doesn’t hurt that the clever studies he cranks out (more than 200 so far) are chock-full of practical factoids. Did you know that men eat more when women are around? Or that yogurt tastes better if it has an organic label? Or that people who hide their cereal boxes weigh 20 pounds less than people who keep their Cheerios in plain sight?
His insights have also attracted interest — and, more importantly, money — from policy makers. In 2014 the U.S. Department of Agriculture forked over a $5.5-million grant to support Wansink’s Smarter Lunchrooms program, which has since been put into practice in more than 30,000 schools across the country.
But in recent weeks, Wansink has become the subject of a less-flattering sort of attention. Four papers on which he is a co-author were found to contain statistical discrepancies. Not one or two, but roughly 150. That revelation led to further scrutiny of Wansink’s work and to the discovery of other eyebrow-raising results, questionable research practices, and apparent recycling of data in at least a dozen other papers. All of which has put the usually ebullient researcher and his influential lab on the defensive.
The slow-motion credibility crisis in social science has taken the shine off a slew of once-brilliant reputations and thrown years of research into doubt. It’s also led to an undercurrent of anxiety among scientists who fear that their labs and their publication records might come under attack from a feisty cadre of freelance critics. The specifics of these skirmishes can seem technical at times, with talk of p-values and sample sizes, but they go straight to the heart of how new knowledge is created and disseminated, and whether some of what we call science really deserves that label.
It began with a seemingly innocent anecdote.
Back in November, Wansink wrote a post on his blog, Healthier and Happier, titled “The Grad Student Who Never Said No.” It was about a visiting graduate student who agreed to reanalyze a dataset from an old experiment. Wansink and his fellow researchers had spent a month gathering information about the feelings and behavior of diners at an Italian buffet restaurant. Unfortunately their results didn’t support the original hypothesis. “This cost us a lot of time and our own money to collect,” Wansink recalled telling the graduate student. “There’s got to be something here we can salvage.”
He had previously offered the dataset to a postdoc in his lab but the postdoc had declined, citing other priorities. The graduate student, however, eagerly took up the task and her efforts led to a number of published papers. Aspiring academics should remember, Wansink wrote, that “time management is tough when there’s [sic] so many other shiny alternatives that are more inviting than writing the background section or doing the analyses for a paper.”
Say yes, work hard, get published. Who could argue with that?
Quite a lot of people, as it turns out. What Wansink had described is more or less a recipe for p-hacking, a practice that has led to a lot of hand-wringing and soul-searching in recent years, particularly among social psychologists. The “p” in p-hacking is a reference to p-value, a calculation that can help establish whether the outcome of an experiment is statistically significant. P-values can be misleading, though, particularly when you try multiple hypotheses on the same dataset. One of these hypotheses may eventually appear to “work,” but that doesn’t mean that you’ve arrived at a solid scientific result. Instead it might just mean that you tortured the data long enough to find a meaningless pattern amid the noise.
The post was catnip for those who make it their business to ferret out research wrongdoing. They’ve been called destructo-critics, methodological terrorists, and worse. Their number includes tenured professors and motivated amateurs who tend to be exacting in their evaluations and brutal in their critiques.
One of those motivated amateurs was Jordan Anaya, who goes by the online handle Omnes Res — Latin for “All the Things,” a name he chose because of his fondness for big data. Anaya left a graduate program in biochemistry and molecular genetics at the University of Virginia a couple of years back to pursue his own research interests, which he does from an apartment in Charlottesville, Va. One of those research interests included writing the GRIM program.
What is GRIM? Here’s a fairly technical answer: GRIM is the acronym for Granularity-Related Inconsistency of Means, a mathematical method that determines whether an average reported in a scientific paper is consistent with the reported sample size and number of items. Here’s a less-technical answer: GRIM is a B.S. detector. The method is based on the simple insight that only certain averages are possible given certain sets of numbers. So if a researcher reports an average that isn’t possible, given the relevant data, then that researcher either a) made a mistake or b) is making things up.
GRIM is the brainchild of Nick Brown and James Heathers, who published a paper last year in Social Psychological and Personality Science explaining the method. Using GRIM, they examined 260 psychology papers that appeared in well-regarded journals and found that, of the ones that provided enough necessary data to check, half contained at least one mathematical inconsistency. One in five had multiple inconsistencies. The majority of those, Brown points out, are “honest errors or slightly sloppy reporting.”
But not all.
Anaya read the Brown and Heathers paper and loved it. As a tribute, he whipped up a computer program to make it easier to check papers for problems, thereby weaponizing GRIM.
After spotting the Wansink post, Anaya took the numbers in the papers and — to coin a verb — GRIMMED them. The program found that the four papers based on the Italian buffet data were shot through with impossible math. If GRIM was an actual machine, rather than a humble piece of code, its alarms would have been blaring. “This lights up like a Christmas tree,” Brown said after highlighting on his computer screen the errors Anaya had identified.
Brown is a graduate student at the University of Groningen, in the Netherlands, and a crusading troublemaker of sorts. His previous exploits include teaming up with Alan Sokal, of Sokal hoax fame, to poke holes in the much-vaunted and now mostly debunked “positivity ratio,” used to supposedly calculate whether someone is flourishing, along with translating the memoir of the notorious data-fabricator Diederik Stapel from Dutch into English.
Anaya, along with Nick Brown and Tim van der Zee, a graduate student at Leiden University, also in the Netherlands, wrote a paper pointing out the 150 or so GRIM inconsistencies in those four Italian-restaurant papers that Wansink co-authored. They found discrepancies between the papers, even though they’re obviously drawn from the same dataset, and discrepancies within the individual papers. It didn’t look good. They drafted the paper using Twitter direct messages and titled it, memorably, “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab.”
It’s been viewed more than 10,000 times.
Their paper, published in January as a “preprint,” meaning that it was not peer-reviewed, is written in a calm, collegial voice, a contrast to the often hectoring tone of social media. But the conclusion was damning. In the paper, the authors allow that while science cannot always be perfect, “it is expected to be done carefully and accurately.”
Wansink’s work, they believed, had failed on both counts.
On his blog, Anaya was less restrained. “If you were to go into the lab and create someone that perfectly embodied all the problems science is currently facing you couldn’t do better than Brian Wansink,” he wrote.
Read more at www.chronicle.com