OK, now Richard has got me to emerge yet again from my hermitage.
You set out to polish a turd and you find a diamond. What are the chances?
To see what I am talking about you simply must read this
Denise Minger is a blogger and obviously talented freelance writer from Portland Oregon who writes a raw food blog. Like Lierre Keith, she is an ex-vegan.
Having always been on the carnivorous end of the spectrum, when I read Colin Campbell's "China Study" I couldn't really get through it. Sort of like the way Richard Dawkins would have trouble paying attention during a lecture on creationism. The thesis is so stupid and implausible on the face of it, he would not get angry, he would just nod off a lot.
Reading The China Study, which asserts that "animal protein" is deadly and causes cancer and heart disease, I just could not muster that much anger. It seemed so totally implausible and full of non-sequiters that I could not take it seriously, any more than one of those "eggs are deadly, eat whole grains" articles on MSN.
Quote overheard from the latest jackass on NPR plugging their health book -"...surpisingly, when we looked at increasing egg consumption, it was not linked to diabetes!"
Bonus points if you can guess which university spawned this genius...
I actually entertained the idea of finding a copy of the monograph that has all the data the China Study book is based on, so I could see for myself how Campbell cherry-picked the stuff he based his assertions on. But then, I must have realized once again that time is finite because I did something else instead. Long time readers know I am old enough to have a bias against wasting my time.
Fortunately, in addition to being better looking than me, Ms. Minger is younger and less cynical.
Her bias is in a plant-friendly direction, and she took it as a serious matter to see if the propaganda that vegans wave around ad nauseum in support of plants was actually true. So she set out to polish a turd. She expended enormous effort to pore through the raw data that Colin Campbell allegedly based his book on to see if it was really true. Do read her whole article, but here are the salient points:
1) There is actually more evidence to link plant protein to cancer than animal protein, the exact opposite of what Campbell claims
when we actually track down the direct correlation between animal protein and cancer, there is no statistically significant positive trend. None. Looking directly at animal protein intake, we have the following correlations with cancers:
Penis cancer: -16
Rectal cancer: -12
Bladder cancer: -9
Colorectal cancer: -8
Cervix cancer: -4
Colon cancer: -3
Liver cancer: -3
Oesophageal cancer: +2
Brain cancer: +5
Breast cancer: +12
But what about plant protein? Since plant protein correlates negatively with plasma cholesterol, does that mean plant protein correlates with lower cancer risk? Let’s take a look at the cancer correlations with “plant protein intake”:
Nasopharyngeal cancer: -40**
Brain cancer: -15
Liver cancer: -14
Penis cancer: -4
Bladder cancer: -3
Breast cancer: +1
Stomach cancer: +10
Rectal cancer: +12
Cervix cancer: +12
Colon cancer: +13
Oesophageal cancer +18
Colorectal cancer: +19
2) There is no clear evidence for animal protein per se being related to cardiovascular disease
3) The alleged green vegetables benefit is largely confounded by latitude
4) Schistosomiasis confounds the colon cancer data and Hep B confound the liver cancer/cholesterol connection
There are many more and her explication of the data is quite sophisticated. Once again, read the whole thing.
But my favorite part is this:
Perhaps more troubling than the distorted facts in “The China Study” are the details Campbell leaves out.
Why does Campbell indict animal foods in cardiovascular disease (correlation of +1 for animal protein and -11 for fish protein), yet fail to mention that wheat flour has a correlation of +67 with heart attacks and coronary heart disease, and plant protein correlates at +25 with these conditions?
Speaking of wheat, why doesn’t Campbell also note the astronomical correlations wheat flour has with various diseases: +46 with cervix cancer, +54 with hypertensive heart disease, +47 with stroke, +41 with diseases of the blood and blood-forming organs, and the aforementioned +67 with myocardial infarction and coronary heart disease? (None of these correlations appear to be tangled with any risk-heightening variables, either.)
If you didn't spot the diamond, read it again. Can you believe it? Campbell makes a lame case indicting "animal protein" that is easily debunked by examining his own data sources. Simultaneously, he has nothing to say about the strongest food association found in the data.
A positive food association.
Positive meaning that food is linked to much higher rates of disease.
An association between wheat and coronary artery disease of +67! (That's a 67% increase in relative risk the way I read it) These numbers are some of the highest in the data set, apparently.
So Ms Minger has approached the ugly task of fact checking that obvious self-aggrandizing fraud Campbell, and with pinched nostrils has found us this nice lovely jewel.
She has perhaps found us more evidence suggesting that one of our three neolithic agents is not good for us.
In epidemiology, these numbers are huge. It is a crime to ignore these numbers on wheat.
Which brings me to my final point, one that Ms Minger did not make.
This is all just epidemiology, and epidemiology is bogus. Now, I don't mean it has absolutely no value. It is good for hypothesis generation. It is almost worthless for finding the truth. It is especially worthless the way it is used by hacks like Campbell who are simply trying to sell people a book that tells them what they want to hear.
What epidemiology is good for is to use it the way Ms Minger has in her investigation. To say, lets look at these data for associations, and from that generate ideas about what causes what. She seems to have done this in as close to an neutral fashion as is possible
To do as Campbell did, or as almost everyone does when they approach epidemiology, and say, "I suspect animal protein is bad, let's see if I can prove that with epidemiology, is quite simply, epistemologically fraudulent.
This does not get said often enough or called what it really is.
If you sift through thousands of variables and look for associations, you will find them. It is in fact a mathematical certainty that some of these associations will be "statistically significant (SS)". It is a further certainty that some of them will be statistically significant soley by accident. Some of your SS will be real, and some will be spurious. You will not always be able to tell which is which!
Let me give two real world examples.
The simple one is taken from what I recall from a medical statistics course ( maybe they don't teach this any more?) If you have a panel of laboratory tests done, say a "Chem 12" and there is a normal distribution of test results ( forget about accuracy, precision and bayes' theorem for now) there will be, say, a 5% probability of one of your labs being outside the normal range. When you do 12 such tests at the same time, the probability is very high that at least one of your labs will be outside the normal range, even if you have nothing at all wrong with you, simply as a function of doing so many tests that are independent. It is kind of like flipping an asymmetrically weighted coin 12 times in a row versus just once.
You can check out this book. In the late 90's the Motley Fool website had a whole series of formulaic systems for picking stocks. These systems were all based on logical formulae for evaluating stocks and how they might perform in the future, using earnings growth rates, dividend ratios, etc. These systems could perform amazingly for long periods of time, and sometimes looked foolproof. These systems were arrived at by back-testing. Back testing is simple. You just make up a bunch of stock selecting rules, go back to old stock market data from Value Line, and see how the model would perform from the starting date in the past until now. Then, whichever of the 20 or so models you test performs the best with the lowest volatility, jsut use that one.
What could be simpler?
Actually, what could be stupider?
If you test enough investing models this way, some of them will work very well entirely by accident. There may be some models that may work for real, but you cannot necessarily tell them from the ones that are spurious with statistical techniques.
Sidebar: The "dogs of the dow" model is the best known and turned into a dog itself shortly after being popularized. Sheard's models lasted as long as the tech bubble, and then you would have been wiped out. And of course what good is just beating the broad market when the market has gone less than nowhere in 10 years anyway?
The reason an investing strategy arrived at through back testing is bogus is the same reason epidemiology is bogus. It may be appropriate to backtest for candidate strategies, but you would then need to prove it prospectively over many years, and even then you could not be certain that it was not a fluke about to wipe out your fortune.
Whenever you read papers that talk about linear regressions, think about these back-tested investing strategies. Multivariate linear regression is exactly the same thing. Some of the significant coefficients will be clinically meaningful, and some will be meaningless accidents. The numbers alone will not tell you which is which.
In the same way as these examples, when Colin Campbell sifts through mountains of data looking for SS associations, and then ignores the ones, (like wheat) that don't fit what he is looking for, we should not be surprised when he finds an association that is SS that support his thesis.
The only way to prove convincingly that animal protein is harmful is to do it prospectively in a controlled trial or to make a well reasoned educated guess with laboratory science evolutionary reasoning and common sense. You cannot prove a damned thing with a data-mining exercise alone.