The Diet Doctor policy for grading scientific evidence

As we base our guides on scientific evidence, it is important to have a clear policy for how to grade the strength of different kinds of evidence. Our policy is in many ways similar to other documents of its kind.1 We focus on how scientific evidence applies to human clinical outcomes, and we define the levels of evidence as follows.2

  1. Systematic reviews or meta-analyses of multiple, high-quality randomized controlled trials:3 Strong evidence.

    Systematic reviews or meta-analyses of observational studies with a hazard ratio (HR) under 2:4Very weak evidence.

    Systematic reviews or meta-analyses of observational studies with HR>2:Weak evidence.

    Systematic reviews or meta-analyses of observational studies with HR>5 and generally following the Bradford Hill Criteria: Moderate evidence.

  2. Randomized controlled trials (RCTs): Moderate evidence. If repeated, very clear and consistent results over multiple trials, then it may be upgraded to: Strong evidence.
    Non-randomized or uncontrolled trials: Weak evidence. Note that while non-randomized trials are weak for determining the best intervention, they can provide other unique insights.5
  3. Observational/epidemiological data is upgraded in strength if the hazard ratio (HR) is over 2.6 With a HR below 2, evidence shows that the correlation is often misleading and false, and it may be more likely to confuse than to inform:7 Very weak evidence.

    With a HR > 2 in high-quality prospective cohort studies, or on occasion with a very large population sample and consistent findings across studies even with HR < 2: Weak evidence.

    Or, if an observational study assesses the prevalence of a condition, but does not attempt to correlate observations with a health outcome, that is Weak evidence.8

    With a HR > 2 in lower-quality observational studies: Very weak evidence.

    Under exceptional and rare circumstances with HR consistently > 5 in several high-quality observational studies, with biological plausibility, no other obvious explanation and generally following the classic Bradford Hill criteria:9 Moderate evidence (e.g. smoking and lung cancer).

  4. Note: If a study uses an odd ratio (OR), we consider that equivalent to a hazard ratio or relative risk if the prevalence of the measured condition is low. However, if the prevalence is high and an adjusted odds ratio is reported, we will defer to the adjusted OR.

  5. Consistent clinical experience (e.g. case series) from several experienced practitioners is also considered, as long as there’s no high-quality science (i.e. randomized intervention studies) to contradict it: Weak evidence.10
  6. Case reports and anecdotes are evaluated with caution and not used as evidence unless there’s a lack of higher-quality evidence available. We will not grade individual anecdotes as evidence, but may grade a collection of anecdotes or published case reports as Very weak evidence.
  7. Animal studies are not considered in fields where studies already exist on humans. If it’s the only evidence available: Very weak evidence.
  8. Mechanistic studies and cell studies are considered extremely weak evidence, lower than even animal studies. These studies may be experimental or may discuss well-known mechanisms but apply them to a novel explanation of a disease or intervention. Whether these mechanisms can be applied to humans in the way discussed is often unknown, even if the mechanism itself is well-established. If clinical data on the subject exist, we will usually refrain from citing mechanistic studies. If it is the only evidence available: Extremely weak evidence.
  9. Opinions of world-leading experts: No evidence. Opinions are not evidence, no matter who has the opinion. To be evidence-based, everyone has to support his or her opinions and theories with believable facts, i.e. scientific evidence.


 

Overview articles

Non-systematic review papers, or comprehensive summaries, are usually a compilation of a few or dozens of individual studies, of different levels of evidence. Unless done systematically to answer a specific question (see #1 above) it’s usually hard or inappropriate to assign one specific level of evidence. These articles are instead marked like this: Overview article.

To assign a specific level of evidence we need to point to the specific studies supporting it.


 

Other ungraded articles

Some articles found in nutrition and medical journals don’t fall into one of the above categories. These are some of the other types of articles you may see referenced in Diet Doctor guides.

Review articles: These articles summarize a topic and may or may not include a balanced review of the literature. They may be heavily influenced by bias without much control over evidence quality. As a result, these articles are ungraded, like this: Review article.

Technical articles: These articles typically describe a new biomedical technique, procedure, or intervention, or the modification of an existing one. These articles are descriptive rather than investigative in nature and are therefore ungraded, like this: Technical article.


 

Success stories

As mentioned above under “case reports and anecdotes,” success stories are considered very weak evidence when it comes to determining if a particular lifestyle change is beneficial. There are several reasons for this, but perhaps most important is that these stories come from a selected population of successful people. We don’t know how many people may have attempted the same lifestyle change without being happy with the results, or only had more moderate success.

However, these stories can add value when it comes to more deeply understanding the subjective experience and feelings of a selected group. This can potentially add another level of insight and inspiration that numbers and statistics alone can’t provide.11


 

Financial bias

When citing evidence from studies at high risk of financial bias (e.g. studies about a drug funded and conducted by the company selling the drug), we note any obvious conflicts of interest and include that along with the grade.12

When nutrition studies are funded by companies/industries with a financial interest in the outcome, we note that bias as well.13


 

Evolutionary considerations

Consistency with what is evolutionarily probable can strengthen evidence, and inconsistency can weaken evidence.

For example, humans and their ancestors have been eating natural saturated fats for millions of years.14 It’s evolutionarily unlikely that eating it in amounts roughly similar to before is a main cause of a new epidemic of chronic disease.

On the other hand, refined pure sugar in large quantities is a phenomenon of the last 150 years. It’s evolutionarily possible that it could have negative health consequences, as humans don’t appear to have adapted to it.

While these examples do not prove cause and effect, they do add context to the scientific evidence.

From this perspective it can also be argued that interventions based on eating certain natural foods — or avoiding foods for a short time (e.g. for one day) — are things that the human body should be evolutionarily adapted to, to a large degree. These interventions may thus be significantly safer, compared to new drugs or surgical interventions.

The evolutionary lens as a way to strengthen or weaken evidence has to be used with a large dose of caution, as it’s not always apparent what environmental factors our ancestors were exposed to, and many of these factors have varied widely over time and geography.


 

Core references

 
Advances in Nutrition 2018: Limiting dependence on nonrandomized studies and improving randomized trials in human nutrition research: why and how

JAMA 2018: The challenge of reforming nutritional epidemiologic research

PLoS Medicine 2005: Why most published research findings are false

The Lancet 2019: Real-world studies no substitute for RCTs in establishing efficacy


 

More

The Diet Doctor policy for evidence-based guides

The science of low carb and keto

Guide to observational vs. experimental studies

Comments

Do you have comments on, objections to or suggested changes or additions to our guidelines? Feel free to email your suggestions to andreas@dietdoctor.com.

  1. For example, our Diet Doctor policy is similar to the Oxford Centre for Evidence-based Medicine – Levels of Evidence document.

  2. Consider these guidelines a “baseline” scenario, assuming studies are well done and without obvious major bias. If a study has major flaws in its design or execution, especially in combination with major financial bias, it may need to be downgraded.

  3. Wikipedia: Systematic review

    Wikipedia: Meta-analysis

    Wikipedia: Randomized controlled trial

  4. We believe that combining very weak nutritional epidemiology studies does not increase the strength of evidence when the hazard ratio is very low, below 2. The original data are typically of such low quality and questionable significance that more of it does not improve the strength of the finding.

  5. The weakness is primarily related to the unfair start – it’s very hard to know which of the tested interventions was the best without a randomized trial.

    However, if an intervention has an effect size that is far larger than what is normally expected, that is still a good indication of what results can be expected for a self-selected, real world population.

    A non-randomized intervention study may add complementary data on expected effect size that can’t be found in RCTs of non-self-selected participants, especially when the RCT is using intention-to-treat analysis. The RCT will likely give significantly lower numbers for the average effect size, compared to what might be expected for a motivated patient who manages to follow a prescribed lifestyle intervention.

  6. A HR > 2 means that a (lifestyle) factor is associated with at least twice (double) the risk of something.

    Wikipedia: Observational study

    Wikipedia: Hazard ratio

    Guide to observational vs. experimental studies

  7. While a cutoff value of 2 is not universally agreed upon, it is included as part of the GRADE criteria for analyzing study quality.

    In addition, the concept of needing a large enough difference to denote a meaningful finding is also supported by the following references:

    Advances in Nutrition 2018: Limiting dependence on nonrandomized studies and improving randomized trials in human nutrition research: why and how

    JAMA 2018: The challenge of reforming nutritional epidemiologic research

    PLoS Medicine 2005: Why most published research findings are false

    The Lancet 2019: Real-world studies no substitute for RCTs in establishing efficacy

    Frontiers in Nutrition 2018: The failure to measure dietary intake engendered a fictional discourse on diet-disease relations

  8. Observational studies that don’t attempt to correlate with outcomes do not have the same risk of introduced bias that other observational studies have, and therefore are upgraded from very weak to weak quality evidence.

  9. Wikipedia: Bradford Hill criteria

    Proceedings of the Royal Society of Medicine 1965: The environment and disease: association or causation? By Sir Austin Bradford Hill

  10. We aim to confirm or question that something is consistent clinical experience, using our low-carb expert panel.

  11. Furthermore, there is occasionally a “black swan” element to these stories. For example, if a condition is generally considered chronic and progressive, like type 2 diabetes, and hundreds of people share stories about putting the disease into remission using nothing more than a lifestyle intervention, this raises a legitimate question: is the common view true, occasionally not true, or simply false?

    This question will then have to be answered using higher-quality scientific evidence such as controlled intervention studies.

  12. Evidence suggests that this financial bias may strongly influence the reported outcome.

    BMJ 2017: Financial ties of principal investigators and randomized controlled trial outcomes: cross sectional study

    PLOS ONE 2016: Relationship between research outcomes and risk of bias, study sponsorship, and author financial conflicts of interest in reviews of the effects of artificially sweetened beverages on weight outcomes: A systematic review of reviews

    BMJ 2003: Pharmaceutical industry sponsorship and research outcome and quality: systematic review

  13. There are also other sources of bias in nutritional studies that may or may not be disclosed. An example of such bias would be the authors’ strongly held dietary preferences:

    JAMA 2018: Disclosures in nutrition research. Why it is different

    To some degree, this potential bias should be taken into account when interpreting the strength of evidence of a study, although it may be harder to quantify than financial bias.

    We require our authors and reviewers of evidence-based guides to disclose any clear potential source of bias, not just financial bias. It should be noted that most – though not all – of our authors and reviewers have the strongly held belief that low carb is an effective, and often the most effective, dietary strategy.

  14. Nature Education Knowledge:
    Evidence for meat-eating by early humans