This week, Nate Silver has relaunched his excellent blog FiveThirtyEight. He has always had a talent for explaining quantitative thought. This time, he is doing so under the umbrella of ESPN, and has risen to the role of editor, allowing him to put out a lot more content. The risk with this transition is that some of the material will drop off in quality, and such is the case for a recent article that attempts to explain Bayes' theorem.
The article in question was Finally, a Formula for Decoding Health News by a guest contributor Dr Jeff Leek. Leek notes that when you read a health headline, the existence of the article itself is not very informative. In order to update your assessment of its claim, you have to read the study behind the headline. Take the headline Hospital checklist cut infections, saved lives. Headlines arise all of the time with claims like this even when the claim is false, and so they don't provide strong evidence.
Dr Leek reduces Bayes rule down to a simple equation:
Final opinion on headline = (initial gut feeling) * (study support for headline)
Although this formulation is roughly correct, it can be misinterpreted. The initial gut feeling is shorthand for your prior probability. That makes the study support for headline the Bayes Factor. Bayes rule is this:
(Updated Probability) = (Prior probability) * (Bayes Factor)
The Bayes factor is how much more likely the evidence is to appear if the contention is true. This takes a moment to get your head around. To make a Bayesian update, you imagine the world where the hospital checklists really do reduce infections (world 1). Then you imagine the opposite - that hospital checklists have a neutral or negative effect on infections (world 2). In world 2, checklists just waste doctors' time and make them more stressed, or something like that. How much likelier are you to see this study in world 1 compared to world 2? Well, it depends on the robustness of the study. If the study very reliably shows a reduction in infections, then you are very unlikely to see it in world 2. The study might be about 4 times more likely to appear in world 1 than world 2 (Bayes Factor = 4), causing you to significantly update your estimate in favour of the proposition that hospital checklists reduce infections. However, if the study is of poor quality, you are roughly as likely to see it in either world. The Bayes Factor is near 1, and you will hardly have to update at all.
However, Dr Leek confuses the issue by reducing the generation of a Bayes Factor to a list of criteria. In the case of the headline about hospital checklists, the study met five of the six criteria.
- Was the study a clinical study in humans?
- Was the outcome something you care about, such as living longer or feeling better
- Was the study a randomized, controlled trial (RCT)?
- Was it a large study — at least hundreds of patients?
- Did the treatment have a major impact on the outcome?
- Did predictions hold up in at least two separate groups of people?
...For the sake of the exercise, let’s multiply by two every time we see a “yes” answer and by 1/2 every time we see a “no” answer. I would say this study’s result is about 16 times more likely (five out of six “yes” answers — 22222*(1/2) = 16) if checklists really do reduce infections than if they don’t. I set study support for headline = 16.
In this instance, Dr Leek's shortcut gives a decent approximation of the Bayes Factor. The study, though not an RCT, is a large one showing a big positive effect on relevant metrics. We would be unlikely to encounter this evidence if the headline was false.
However, in other scenarios, Leek's shortcut will give approximations that are clearly not right. Suppose that a study meets all of the criteria except that on one criterion it is fatally flawed. Suppose the sample size is far too small (e.g. there is only one participant), or rather than mortality and morbidity rates, the study measures something that doesn't matter at all like physician satisfaction with the surveys. This can cripple the study entirely, and so the Bayes Factor should not just from 64 to 32 - it should fall to around 1. A fatally flawed study is no-longer useful evidence - it is no likelier to appear where the headline is true, in world 1, than in world 2 where it is false.
The shortcut fares worse when applied to poor quality studies. Dr Leek says that a poor quality study makes the headline less likely to be true. On his view, a perfectly awful study gives support of 1/64. When you multiply your prior odds by a Bayes Factor of 1/64, a bad study then greatly reduces your credence in a headline. However, a perfectly awful headline should give you a Bayes Factor of around 1. The likelihood of seeing an awful study is equal if its hypothesis is true or false. And so if a study is designed terribly, it will not greatly decrease your credence as Dr Leek suggests - it will hardly change your credence at all.
This is not lost on the readers of FiveThirtyEight. Take a look at the comments section, where Ben Kuhn explains:
Your factor for "study support for headline" is supposed to be the Bayes factor P(study supports headline | headline is true) / P(study supports headline | headline is false). If you actually think about that, it's clear that it should be very rare to find a study that is more likely to support its conclusion if that conclusion is not true. (It's not impossible, because of things like publication bias: everything in that formula is also conditioned on you hearing about the study, and you may be more likely to hear about a study that's false, because false studies are more surprising and therefore more likely to be published. But I can't think of plausible scenarios in which the study provides that much evidence against its conclusion.)
Or take another comment from Vasilii Artyukhov:
Let's say we have a study examining whether global warming is a hoax. Let's say the researchers were testing this by performing a coin toss: Heads for Hoax. The coin landed tails, so the headline goes "global warming proven to be real". After looking at the paper we find that the method gets a minus on all 6 criteria we came up with for climate research. Does that mean that after reading this study I must increase my odds for "global warming is a hoax" by 2^6?
Or, as Robert Pirsig puts it, "The world's greatest fool may say the Sun is shining, but that doesn't make it dark out." This is elaborated by Eliezer Yudkowsky, a great advocate for bayesianism, in his LessWrong post Reversed Stupidity is Not Intelligence. Essentially, to decode a newsheadline, you imagine seeing the study in world 1, where its contention is true or in world 2 where the contention is false, and adjust your prior probability by the relative likelyhood.
So although there are godo ways to decode health news, Dr Leek's shortcut is not one of them. Although Dr Leek is a guest-author, by publishing this article, Nate Silver is obscuring matters for his unusually clear-thinking audience. However, the new FiveThirtyEight is still young and I am optimistic that it can return to form as a platform for advocating rationality.