The Diet Doctor policy for grading scientific evidence
- Systematic reviews of multiple, high-quality randomized controlled trials:3 Strong evidence.
- Randomized controlled trials (RCTs): Moderate evidence. If repeated, very clear and consistent results over multiple trials: Strong evidence.
Non-randomized controlled trials: Weak evidence. Note that while non-randomized trials are weak for determining the best intervention, they can provide other unique insights.4
- Observational/epidemiological data is seriously considered if the hazard ratio (HR) is over 2.5
With an HR below 2, evidence shows that the correlation is often random and false,6 and it may be more likely to confuse than to inform: Very weak evidence.
With an HR > 2 in high-quality prospective cohort studies: Weak evidence.
With an HR > 2 in lower-quality observational studies: Very weak evidence.
Under exceptional and rare circumstances with HR consistently > 5 in several high-quality observational studies, with biological plausibility, no other obvious explanation and generally following the classic Bradford Hill criteria:7 Moderate evidence (e.g. smoking and lung cancer).
- Consistent clinical experience (e.g. case series) from several experienced practitioners is also considered, as long as there’s no high-quality science (i.e. randomized intervention studies) to contradict it: Weak evidence.
- Case reports and anecdotes are evaluated with caution, and not used as evidence unless there’s a lack of higher-quality evidence available: Very weak evidence.
- Animal studies are not considered in fields where there already exist studies on humans. If it’s the only evidence available: Very weak evidence.
- Opinions of world-leading experts: No evidence. Opinions are not evidence, no matter who has the opinion. To be evidence-based, everyone has to support his or her opinions and theories with believable facts, i.e. scientific evidence.
Non-systematic review papers, or comprehensive summaries, are usually a compilation of a few or dozens of individual studies, of different levels of evidence. Unless done systematically to answer a specific question (see #1 above) it’s usually hard or inappropriate to assign one specific level of evidence. These articles are instead marked like this: Overview article.
Other ungraded articles
Some articles found in nutrition and medical journals don’t fall into one of the above categories. These are some of the other types of articles you may see referenced in DietDoctor guides.
Cell studies and mechanism studies; studies or reviews of biochemical, genetic and physiological mechanisms: These studies may be experimental or may discuss well-known mechanisms but apply them to a novel explanation of a disease or intervention. Whether these mechanisms can be applied to humans in the way discussed is often unknown, even in the mechanism itself is well-established. As a result, these articles are ungraded, like this: Mechanism study.
Technical articles: These articles typically describe a new biomedical technique, procedure, or intervention, or the modification of an existing one. These articles are descriptive rather than investigative in nature and are therefore ungraded, like this: Technical article.
As mentioned above under “case reports and anecdotes”, success stories are considered very weak evidence when it comes to determining if a particular lifestyle change is beneficial. This is for many reasons, perhaps primarily that these stories come from a selected population of successful people. We don’t know how many people may have attempted the same lifestyle change without being happy with the results, or only had more moderate success.
However, these stories can add value when it comes to more deeply understanding the subjective experience and feelings of a selected group. This can potentially add another level of insight and inspiration that numbers and statistics alone can’t provide.8
Evidence from studies at high risk of financial bias, e.g. studies on a drug funded and conducted by the company selling the drug, is discounted by one level. Studies suggest that this financial bias may be strongly influencing the reported outcome.9
Nutritional studies funded by companies with a financial interest in the outcome are likewise discounted.10
Consistency with what is evolutionarily probable can strengthen evidence, and inconsistency can weaken evidence.
For example, humans and their ancestors have been eating natural saturated fats for millions of years.11 It’s evolutionarily unlikely that eating it in amounts roughly similar to before is the cause of a new epidemic of chronic disease.
On the other hand, refined pure sugar in large quantities is a phenomenon of the last 150 years. It’s evolutionarily possible that it could have negative health consequences, as humans may not yet be fully adapted to it.
While these examples do not prove cause and effect, they do add context to the scientific evidence and can therefore help elevate or decrease the level of evidence somewhat.
This line of reasoning might also be applied to long-term treatments with drugs that reduce the function of an enzyme or receptor in the body. If that was a good idea, all things considered, it could have been relatively easy for evolution to remove or reduce that function (or never develop it in the first place). This means that the positive effect of the drug might be outweighed by side effects under most circumstances for most people. Thus long-term treatment with a drug might be a good idea — if it is a good idea — primarily under specific circumstances and/or for people with specific diseases.12
From this perspective it can also be argued that interventions based on eating certain natural foods — or avoiding foods for a short time (e.g. for one day) — are things that the human body should be evolutionarily adapted to, to a large degree. These interventions may thus be significantly safer, compared to new drugs or surgical interventions.
The evolutionary lens as a way to strengthen or weaken evidence has to be used with a large dose of caution, as it’s not always apparent what environmental factors our ancestors were exposed to, and many of these factors have varied widely over time and geography.
Do you have comments on, objections to or suggested changes or additions to our guidelines? Feel free to email your suggestions to firstname.lastname@example.org.
For example, our Diet Doctor policy is similar to the Oxford Centre for Evidence-based Medicine – Levels of Evidence document. ↩
Consider these guidelines a “baseline” scenario, with studies that are well done and without obvious major bias. If a study has major flaws in its design or execution, especially in combination with major financial bias, it may need to be downgraded. ↩
The weakness is primarily related to the unfair start – it’s very hard to know which of the tested interventions was the best without a randomized trial.
However, if an intervention has an effect size that is far larger than what is normally expected, that is still a good indication of what results can be expected for a self-selected, real world population.
A non-randomized intervention study may add complimentary data on expected effect size that can’t be found in RCTs of non-self-selected participants, especially when the RCT is using intention-to-treat analysis. The RCT will likely give significantly lower numbers for the average effect size, compared to what can be expected for a motivated patient who manages to follow a prescribed lifestyle intervention.
A non-randomized trial may give the best estimation of the average effect size for someone who is managing to follow a specified lifestyle treatment. ↩
An HR > 2 means that a (lifestyle) factor is associated with at least twice (double) the risk of something.
Furthermore, there is occasionally a “black swan” element to these stories. For example, if a condition is generally considered chronic and progressive, like type 2 diabetes, and hundreds of people share stories about reversal of the disease using nothing more than a lifestyle intervention, this raises a legitimate question: Is the common view true, occasionally not true, or simply false?
This question will then have to be answered using higher-quality scientific evidence such as controlled intervention studies. ↩
PLOS ONE 2016: Relationship between research outcomes and risk of bias, study sponsorship, and author financial conflicts of interest in reviews of the effects of artificially sweetened beverages on weight outcomes: A systematic review of reviews
There are also other sources of bias in nutritional studies, that may or may not be disclosed, and that can have an influence. An example of such bias may be strongly held dietary preferences:
To some degree this potential bias may have to be taken into account when interpreting the strength of evidence of a study, although it may be harder to quantify than financial bias.
We require our authors and reviewers of evidence-based guides to disclose any clear potential source of bias, not just financial bias. ↩
The counterargument to this is that our environment may have changed so much that what was evolutionarily beneficial thousands of years ago (or more) might occasionally be neutral or even harmful today. For example, a genetic adaptation that protected against infections but had other side effects might be negative today, in an era of antibiotics.
Sometimes our environment may have changed so much that long-term treatment with a drug that reduces a natural function could potentially be a good idea.
However, the safest course of action might often be to first try to more closely emulate the environment that was common before, at least when it comes to safe things like eating natural old-fashioned foods, exercising, getting adequate sunlight, sleeping enough, etc. For example, this might reduce the need for long-term treatment with drugs for components of the metabolic syndrome, like type 2 diabetes. ↩