There are three kinds of lies: lies, damned lies, and statistics.As discussed in the last post, we are today faced with a dizzying array of contradictory "recommendations" when it comes to health and lifestyle, particularly what to eat. How is it possible that all of these experts come to such differing conclusions? The quotes above illustrate certain root aspects of the problem. First, there is a gross misunderstanding of what "statistics" really is meant to accomplish, how it is properly applied, and what the answer "means". Second, and at least partly because of the first issue, the vast quantities of hard information ("facts" or "data") we have available tend to get filtered and twisted to erroneous conclusions, often supporting goals (e.g. "sell more books") other than maximizing the health of the population. Finally, despite the apparent rigor, technology, and expertise applied for answering key scientific questions, the process of actually turning scientific results into useful decisions is generally an exercise muddled thinking rather than rational inference.

Benjamin Disraeli, British politician (1804 - 1881)

Statistics: The only science that enables different experts using the same figures to draw different conclusions.

Evan Esar,Esar's Comic Dictionary, American Humorist (1899 - 1995)

Where is the knowledge that is lost in information? Where is the wisdom that is lost in knowledge?

T.S.Eliot (1888 - 1965)

The first two quotes embody general perceptions about "statistics". Those holding these views, by the way, include most professional scientists. When I worked as a research scientist in gamma-ray astrophysics, I can't tell you how many times colleagues would say things like "You can get any answer you want using statistics". Of course, they were quite happy to (supposedly) get the answers they wanted via application of statistical methods, but that irony isn't the point, because the incontrovertible truth is precisely opposite: there's only one right answer. Esar's quote embodies this attitude, so let's pick it apart. First, statistics is not a "science". Science involves the broader exercise of observing, modeling using mathematics, and interpreting observations in terms of those models. Statistics is just math, and as in any well-posed mathematical problem, there's only one right answer. I've thought of only three ways in which two scientists could come up with different answers to the same statistical problem:

- Somebody made a math error (happens more often than you think, see Point 3).
- They used different input information (in which case it's not really the same problem in both cases).
- They used different approximate methods (often badly) to solve the problem.

But the problem runs even deeper, because even when scientists do the math right and apply the recipes under the appropriate conditions, more often than not they're still not answering the right question. Pure science attempts to answer questions like "does eating lollipops induce insulin resistance". Applied science wants to actually make decisions, e.g. "should I eat this lollipop, given what I know about it's effects on insulin resistance, the effects of insulin resistance on health?" etc. Almost always, the answers given in scientific papers are of the form "We compute a 95% probability that we would have observed these data if lollipops cause chronic insulin resistance". That's a much different statement than "We compute a 95% chance that lollipops chronically increase insulin resistance given the data we observe and other prior information," and only this latter statement is of any use on the applied end of things, when deciding whether or not to eat lollipops.

A very simple example might help to illustrate the issues. Suppose somebody wants you to gamble on a coin flip: you bet $1, and if the coin comes up heads, you win your dollar plus another $2. If the coin comes up tails, you lose your $1. How do you decide whether or not to play this game? Right from the start, we can see standard statistics is going to have trouble, because you have no data. Intuitively you might guess that there is a 50% chance of heads, and indeed if you had no other information at all, you would be right. With two possibilities, and no information to distinguish which would be more likely, you would assign equal probabilities to both outcomes. Our decision to play or not comes down to how much money we'd expect to have in each case. If we don't play, we have a guaranteed $1 in our pocket. If we do play, then with no other information there's a 50% chance we'll have $3, and a 50% chance we'll have zero, so $3*0.5 + $0*0.5 = $1.50. On average, we have $1.50 if we play and $1 if we don't, so we should play, given that we lack any other information about the game.

But what if our information were different? For example, suppose somebody we trust tells us that she knows the coin-flip guy is a scam artist, having been arrested several times. Now what do you do? Most people would now intuitively keep their money, and indeed a mathematical analysis would likely indicate that this is the proper course of action, as on average you would now expect to lose more often than not. But notice that the only thing that has changed is our knowledge about the game: the coin being flipped is the same, as is the person doing the flipping, and presumably the laws of physics governing coin flips. Purposely ignoring this new information would be fantastically stupid, particularly given that it came from a trusted source. Note that we still have no "data" in the traditional sense.

What if we had some data? Suppose the person we're playing against offers to let us flip the coin one hundred times before betting, and 99 out of 100 come up heads. Now you have some more information about the coin. Assuming that you're not doing something to introduce a bias, you have some additional confidence that the coin itself is biased towards heads, even more inducement to play the game, because the likelihood that you would have observed 99 heads out of 100 flips would be low for a fair coin. But that data and the associated likelihood are only part of the picture: do you now ignore the input of your trusted friend? Does your data trump the information they provided? Obviously it cannot. The aforementioned likelihood of seeing 99 of 100 flips come up heads was calculated with the a priori assumption that the game was fair. But your friend's input tells you that fairness is unlikely, and given your other information about scam artists, the likelihood that you saw 99 heads when you were flipping the coin and no money was at stake should be considerably higher. And the likelihood of 99 heads answers the wrong question anyway. Our decision to play or not must be based on the probability that the coin, when flipped by our opponent, will come up heads, not on the probability that you would have flipped 99 heads in 100 tries assuming the coin was fair.

The key point here is that information is information is information. The data gathered in a particular experiment is just more information which can be used to update our beliefs in different hypotheses. But most experiments don't start from zero, where the gathered data is the only available information. Usually others have conducted experiments that gathered other data. There's generally other relevant information as well. In the lollipop/insulin resistance example, we know that the glucose in the lollipop raises insulin, and that the fructose may at least temporarily contribute to insulin resistance. Any reasonable analysis must include this additional information when evaluating our belief in the hypothesis under test ("lollipops induce chronic insulin resistance"). Ignoring this information is no different than arbitrarily excluding data from our analysis (after all, data is just a kind of information); yet this is precisely how most scientific results are presented and interpreted.

Have a headache yet? I know, I know, this is some tough material. The whole area of reasoning under uncertainty is mathematically and philosophically deep. This is about the fourth time I've tried writing a reasonably accessible post, and have concluded that it's fundamentally hard to talk about. But if you're going to make good decisions about your health, it's good to have at least some idea where the flaws are in most scientific analyses, and also how to think about the issues in the "right way" so as not to be misled. So let's take a moment to review the key lessons you should take away at this point:

- Though I didn't explicitly say it above, the notion of a probability really reflects the degree of belief in some statement (e.g. "the next coin flip will be heads"). Probabilities are real numbers between 0 and 1 (or 0% and 100%), where 0 represents absolute belief that the statement is false, and 1 absolute belief that it is true.
- We don't necessarily need "data" to assess the probability that a statement is true, any type of information will do. There's nothing special about data, they're just more information to be used in updating degrees of belief (probabilities).
- To properly assess the probability of a hypothesis, we must include not just the data, but also any other relevant background information.
- If the outcome of a decision you're making depends on a hypothesis being true, then you need to know the probability of that hypothesis being true given all of the relevant available information. The probability that some particular data would have been observed assuming the hypothesis to be true necessarily ignores relevant information, precisely because it assumes the truth of the hypothesis without accounting for the possibility that the hypothesis is false. This is impossibly circular: you can't assess the degree of belief in a hypothesis if your analysis uniformly assumes it to be true.

- Selectively ignore prior information;
- Misapply statistical approximations to calculate the wrong number (probability of data assuming hypothesis is true);
- Interpret their results as indicating absolute truth or falsehood;
- Perform this interpretation via vague mental gymnastics rather than rigorous mathematics.

Now it may sound as if the picture is bleak for science, but rather amazingly, science seems to eventually bumble around to the correct conclusions. It's just a highly inefficient process because of the issues above. Scientists tend to hold on to certain "widely believed" hypotheses like grim death regardless of the actual evidential support; but eventually there comes a point when evidence for an alternative hypothesis becomes so overwhelming it becomes impossible to ignore (if you're paying attention, you can watch this process at work right now for low-carb diets). Science would benefit greatly, of course, by adopting a more rigorous analytical approach addressing the issues above. Such an approach exists, generally denoted "Bayesian Statistics". I don't like this term, since the methodology neither focuses on "statistics" per se (rather on probabilities), and it's namesake the Rev. Thomas Bayes really made only a tangential contribution to the whole business. "Probability Theory" is a more apt term, reflecting the idea that it extends the idea of logic to the case where we're not 100% sure of the truth/falsehood of statements.

At the end of the post, I'll briefly discuss Probability Theory further and give a few references for those who are interested in the technical details. But for those just trying to puzzle through the maze of information presented by the media, doctors, etc. we can borrow some of the ideas from Probability Theory, putting together a way of thinking about evidence (information), hypotheses (knowledge), and decisions (wisdom). The T. S. Eliot quote at the top describes the situation we wish to avoid, one that many people experience now, struggling to make wise choices when faced with an avalanche of information and knowledge from different sources.

So let's see how we can apply the four lessons above in everyday thinking.

- Probabilities are just numbers representing degrees of belief. I'm not suggesting you carry a bunch of numbers around in your head to track your beliefs, but do recognize that most ideas are neither absolutely true nor absolutely false. We intuitively recognize that such absolutism is pathological, as seen by the often bizarre irrationality exhibited by dogmatists, who refuse to move from a position regardless of the weight of the evidence against that position. Probability Theory encapsulates that behavior mathematically. When new evidence is introduced, Probability Theory gives a formula for updating your beliefs (see math below), basically multiplying your current probability by the weight of that new evidence. But zero times anything is zero, i.e. if you were absolutely sure your current idea was right and all others were wrong, no amount of evidence would ever change your probability. So make sure you are always flexible in reassessing your beliefs. Mental discipline is required. The brain's natural tendency is to seek absolutes, as exhibited by the phenomenon of cognitive dissonance. Learn to be comfortable with uncertainty. Decisions can still be made in the absence of certainty; as Herdotus said, "A decision was wise, even though it led to disastrous consequences, if the evidence at hand indicated it was the best one to make; and a decision was foolish, even though it led to the happiest possible consequences, if it was unreasonable to expect those consequences."
- Just because you're not a scientist (or even if you are) and don't have detailed access to scientific data, it does not mean you can't weigh evidence and update your beliefs. Information is information is information, whether its numbers or a brief newspaper story. The trick is in getting the weight in the right ballpark. A good rule of thumb: individual reports or results generally should not sway your belief very much. Strong belief is usually built on multiple independent results from different sources.
- Be sure to include all of the information you have available. Another manifestation of cognitive dissonance: when presented with evidence contradicting a strong belief, we give it zero weight. That's a mistake. Contradictory evidence should lessen your belief at least a little, like it or not. Do include evidence from all sources, including anecdotal and personal experience. Just be careful not to overweight that evidence. Be aware that truth is usually conditional. Take the following hypothesis: "You can't become obese on a zero-carb diet." The truth of that hypothesis is conditional on other hypotheses, e.g. "Insulin is the hormone governing fat storage" and "Insulin is primarily driven by carbohydrate consumption". Changes in the belief of these supporting hypotheses necessarily changes the belief in the main hypothesis, for example knowledge of the ASP pathway for fat storage changes our belief that insulin runs the show, and hence modifies our belief that zero-carb diets make you immune to obesity.
- We saw at the beginning that there are only three ways that scientists should disagree when assessing hypotheses. Adapting that to mental inference, disagreement implies that one or both people are irrational and/or have different information. Don't waste your time with arguing irrational people. Anybody who says things like "we'll have to agree to disagree" is irrational, because they have no information supporting their position and/or are unwilling to accept information that may modify their beliefs. But if you find yourself in disagreement with someone who seems rational, then engage in discussion to share the differing information that is at the root of your disagreement. You may not come to agreement - it's difficult to extract all relevant information and knowledge from somebody's head - but you at least will likely learn something new.
- Decisions require not only the quantification of information as probabilities (or at least some qualitative mental equivalent), but also a clearly defined goal. The goal in our coin-flip game was straightforward: on average, maximize the amount of money in your pocket. It's not so easy to quantify the goal of maximizing health. People try, which is why doctors love to measure things like cholesterol and blood sugar, but such metrics can only provide a narrow view of one particular aspect of overall health (and even if they didn't, treatment decisions are generally not properly analyzed anyway). Treatment decisions often involve modification of one or a small set of such numbers, which is incredibly myopic as it ignores overall health (hence the spectacular failure of "intensive therapy" to lower blood sugar in Type II diabetics by pumping them full of insulin). Remember also to include the potential long-term effects of your decisions, e.g. cranking up the insulin of those Type II diabetics lowers blood-sugar in the short-term, but increases probability of early death, which presumably outweighs the short-term benefits.

- Degrees of belief (probabilities) are represented by real numbers.
- Qualitative correspondence with common sense, e.g. if your belief in some background information increases (e.g. "Coin-flip guy is cheating") then so should your belief in a hypothesis conditioned on that information ("I will lose the coin flip when coin-flip guy does the flipping").
- The procedure for assessing degrees of belief (probabilities) must be consistent, where consistency can be described in three ways:
- If a conclusion can be reasoned out in more than one way, then all ways must lead to the same answer.
- Conclusions must be reached using all of the available evidence.
- Equivalent states of knowledge lead to the same probabilities based on that knowledge.

To make use of Probability Theory, we need some mathematical rules for manipulating the probabilities of different propositions. A little notation first: let A|C mean "A is true if C is true". AB|C means "A and B are true given C", while A+B|C means "A or B is true given C". Let ~A|C mean "A is false given C". If p(A|C) denotes the probability that A is true given C, then we have the following product and sum rules:

- p(AB|C) = p(A|C) p(B|AC) = p(B|C) p(A|BC)
- p(A + B|C) = p(A|C) + p(B|C) - p(AB|C)

That's most of Probability Theory, IMHO far more conceptually elegant and mathematically simple than the mess of statistics most of us were taught. That's not to say that actually solving problems is necessarily easy, but with a sound conceptual basis and simple rules, it's a lot easier to solve them consistently, get numbers that actually make sense, and combine different scientific results to understand their impact on various hypotheses.

This last point is important. We discussed earlier how new information ("data") must be used to update our beliefs. We shouldn't look at two different scientific results and try to pick between them. Rather our belief in a hypothesis derived from the first result must be adjusted when we get the second result. Try figuring out how to do this using standard statistics. The Probability Theory recipe for this follows trivially from the product rule. Let's rewrite the second equality in the product rule as follows:

- p(H|DI) p(D|I) = p(D|HI) p(H|I)

- p(H|DI) : The probability that our hypothesis is true, given the observed data AND background information. This is called the posterior, and is the key quantity for scientific inference and decision-making.
- p(D|I) : The probability that we would have observed the data given the background information independent of the hypothesis, called the evidence.
- p(D|HI) : The probability that we would have observed the data given both the hypothesis AND the background information, called the likelihood.
- p(H|I) : The probability that the hypothesis is true given only the background information, denoted the prior.

- p(H|DI) = p(D|HI) p(H|I) / p(D|I)

This formula goes by the name of Bayes' Theorem, so named for the Rev. Thomas Bayes who originally derived a form published in a posthumous paper in 1763. The version shown above was actually published by Laplace in 1774, so we see these ideas have been around for awhile. The power of Bayes' Theorem is hopefully clear: given some prior probability, i.e. our degree of belief in a hypothesis, we know how to update the probability when new data is observed, independent of how we arrived at our prior probability. So no matter what experiments I did (or even if no experiments have been done) to arrive at p(H|I), I can simply update that belief given my new data. Note that the term usually reported in scientific results is the likelihood, which is only part of the story.

If you've ever looked at a "meta-analysis", where somebody tries to combine results from many different experiments, you may have noted that it involves a lot of statistical pain, and often includes cutting out some results (e.g. favoring clinical over epidemiological studies), which violates the whole idea of using all available information. This sort of combination would be straightforward using Probability Theory, presuming all of the original results to be combined were also derived with Probability Theory. No reason to leave out some of the results due to "lack of control". A proper Probability Theory treatment would, for example, quantitatively account for the large number of "uncontrolled variables" (which really implies a lack of information connecting cause and effect) in a population study and adjust the probabilities accordingly.

Now, the few of you who have actually made it this far may be wondering why, if Probability Theory is so much better than standard statistics, is it not widely applied? As with many such situations in science, the answer is complicated, and at least partly tied up with human psychology and sociology. You can read about it more in the references, but I'll hit a few high points. It is interesting to note that Probability Theory was accepted and used prior to the mid-19th century or so. Laplace, for example, used it to estimate the mass of Saturn with considerable accuracy, so much so that an additional 150 years of data only improved the result by only 0.63%. Despite this, there were some technical problems. One is that the mathematical equations arising from application of Probability Theory can be difficult or impossible to solve via pencil and paper. This is largely alleviated by using computers to do the calculations, but 19th century scientists did not have that option.

There were also philosophical issues. Nineteenth-century thinkers were pushing toward the idea that there existed some some sort of objective scientific truth independent of human thought. The idea that probabilities represented degrees of belief was apparently too squishy and subjective, so they adopted the idea "let the data speak for themselves", and that probabilities reflected the relative frequencies of different measured outcomes in the limit of an infinite number of observations. So if you flip a fair coin infinitely many times, exactly half of the outcomes (50%) would be heads. At the core of the philosophical disagreement lay a couple of technical difficulties. First, there was no known reason to accept the sum and product rules as "right" within the context of Probability Theory (one could propose other rules), yet they arose naturally from the frequency interpretation (Cox later showed the rules could be uniquely determined assuming the basic concepts of Probability Theory). Second, Bayes' Theorem tells us how to update our beliefs given data. But if you "peel the onion" so to speak, going back to the point before any data had been collected, how do you assign the prior probability p(H|I)?

This proved to be a sticky problem. Special cases could be solved, e.g. it's clear that for the coin-flip problem with no other information that you should assign 50%/50%. But for more complicated problems where one had partial information, no general method existed for calculating a unique prior. It wasn't until the 1950's that physicist Edwin Jaynes successfully addressed this issue, borrowing ideas from information theory statistical physics. Jaynes introduced the idea of Maximum Entropy, which basically told you to assign probabilities such that they were consistent with the information you had, while adding no new information (information theory tells you how to measure information in terms of probabilities; entropy is just a measure of your lack of information). The underlying arguments are deep, stemming from the idea of Concept 3.3 that equivalent states of knowledge represent a symmetry, and that your probability assignments must reflect that symmetry. To do otherwise would be adding information without justification. But the horse was out of the barn at that point. The frequency approach had been used in practice for decades, and even in the 50's the computing technology required for practical widespread use of Probability Theory did not exist.

Today, of course, computers are cheap and ubiquitous, and indeed the use of Probability Theory is beginning to increase. But the progress is slow, and as is often the case, widespread change will require the next generation of scientists to really grab the idea and run with it while the current generation fades away.

Whew, that was quite the marathon post. I've hardly done the topic justice, but hopefully you at least got some ideas about what's wrong with how scientific inference is presently done, how you can avoid being confused by apparently conflicting results, and where the solution lies. Below are the promised references.

- Probability Theory: The Logic of Science, E. T. Jaynes: The "bible" of Probability Theory. Jaynes was perhaps the central figure in the 20th century to advance Probability Theory as the mathematical framework reflecting the scientific method. This book is jam-packed with "well, duh" moments, followed by the realization that almost everyone in science reasons in ways which range from unduly complex and opaque to mathematically inconsistent. Not an easy book to read, full of some difficult math, but also plenty of conceptual exposition and very clear thinking about difficult topics. Jaynes does tend to rant a bit at times, but usually against determined stupidity. Required reading for all scientists, and anybody who needs to make critical decisions in the face of incomplete information.
- Articles about probability theory: an online collection, including the works of Jaynes. You can download the first three chapters of "Probability Theory" in case you want a taste before plunking down 70 bucks. I particularly like this article detailing the historical development.
- Data Analysis: A Bayesian Tutorial, D. Sivia and J. Skilling: A more pithy presentation aimed at practitioners. Clearly written without being too math-heavy, "Data Analysis" hits the high points and illustrates some key concepts with real-world applications. A good place to get your feet wet before tackling the intellectual Mt. Everest of Jaynes' book.

## 6 comments:

Hi all. Just wanted to share another great quote which came up on Quote of the Day. It's also from Evan Esar: "Statistician: A man who believes figures don't lie, but admits that under analysis some of them won't stand up either."

Hi, Dave. Thanks for the information.

You probably know this but many of your readers may not: Most physicians practicing in the trenches don't know or remember much about statistics and probability theory, despite undergraduate degrees in the sciences. We read science-based articles in medical and other journals, but usually depend on the pre-publication peer-review process and post-publication letters to the editor to ensure reasonalbleness of the statistical analysis.

Any thoughts on NNT - "number needed to treat" - analysis? Let's assume a statin drug taken daily for five years has been proven to reduce mortality over five years by two percent, compared to those who don't take the drug. Prevention of death is good, right? NNT analysis purports to answer the question, "How many patients need to take the drug to prevent one death over five years?" The "number needed to treat" may be 300, or maybe it's thirty. If the answer is 30, then 29 patients are taking the drug daily without any mortality benefit over five years. That's a helpful piece of the puzzle when I have to decide whether to recommend the drug. I also want to know how much the drug costs, potential adverse effects, impact on quality of life, degree of required monitoring, etc.

Hi Steve. You bring up excellent points. I agree that the correctness of the analysis needs to be checked during peer review. My experience from physics, by the way, is that such checking rarely occurs. In any case, doctors are really at the decision end of things. To be useful in making treatment decisions, research results need to be presented such that the information is useful for decisions. That would generally imply reporting of posterior probabilities, e.g. "the probability of death in five years given symptoms X and treatment with drug Y". NNT is sort of like that, haven't thought about it deeply. I suspect it may be roughly equivalent to a posterior, depending on how the number was reached, if all relevant information was included, how it relates to the proper goal, etc.

The question then becomes how one uses this information to actually make treatment choices. All decisions require the following:

1) A well-defined goal.

2) Choices, now and in the future.

3) Information (probabilities) about uncertain future outcomes.

4) Changes to the value of the goal brought about by those decisions or uncertain outcomes.

As you note, the goal for health-related decisions is difficult to define, with multiple aspects. The monetary facets are simplest: cost of a drug, cost of emergency intervention (e.g. treatment for a heart attack). Quality of life and the monetary value of life are more difficult to quantify. And there's an interesting time aspect, both due to the time-value of money, as well as the idea that while death is inevitable, most people wish to delay it as long as possible.

Doctors have to try and keep all of this stuff in their heads when making treatment decisions, which is very difficult. Ideally we would have computer software which would help with task, which would allow the doctor to focus on the patient end of things rather than trying to wade through research results, drug literature, etc. But for such a tool to be useful, research results must be presented as discussed, not in terms of whatever statistical test the researcher pulled out of the air. This applies even in the absence of this software tool: I suspect it would be much easier for doctors to evaluate research evidence if it were presented uniformly as probabilities (or something related like odds ratios).

Here's an interesting related article:

http://www.economist.com/science/displaystory.cfm?story_id=12376658

Fascinating article. One question, though - in the formula:

p(A + B|C) = p(A|C) + p(B|C) - p(AB|C)

I'm not clear on why you're subtracting out the last term - does this mean that p(A + B|C), means A

xorB (exclusive or)?@John,

P(A+B|C) is interpreted as OR, not XOR. The subtracted bit falls out of the algebra, using the product rule. See link below for a good derivation:

http://users.ics.tkk.fi/harri/thesis/bayesformulas.html

Post a Comment