Guide to observational vs. experimental studies
Although findings from the latest nutrition studies often make news headlines and are shared widely on social media, many aren’t based on strong scientific evidence.
You’ve no doubt noticed that there are conflicting reports about whether a food is good or bad for you. One day headlines will say drinking coffee is overwhelmingly beneficial, but the following day new headlines shout that coffee increases risk of heart attacks.
Indeed, most foods and diets have been both positively and negatively associated with health in various studies. According to a 2012 article, eggs, carrots, cheese and several other foods have been shown to both increase and decrease risk of cancer, depending on the study!1
To say that this can be confusing and frustrating is an understatement. Many of us do our best to make food choices that will improve our health and quality of life. How can we know if the latest research being reported is reliable?
Generally speaking, the media fail to evaluate the evidence; instead, studies with “exciting” conclusions are turned into click-worthy headlines, no matter how weak the evidence is.
Most of the studies used to connect foods to chronic diseases are observational studies, not experimental ones. While these are far cheaper and easier to do, they can very rarely prove anything at all.2
In this guide, we discuss the differences between observational and experimental studies, the advantages and disadvantages of each, and why in nearly all cases observational research shouldn’t be used when making decisions about your diet. After reading this guide, you may be able to identify media reports about nutritional science that you can safely ignore, i.e. most of them.
In our evidence-based guides at Diet Doctor, we make it simple by using a color code to show how strong evidence a study provides: strong, moderate, weak or very weak evidence.3 After reading this guide, you’ll understand much more about what that means.
What is an observational study?
In an observational study (also known as an epidemiological study), researchers observe a group of people to see what happens to them over time. Although study participants may answer questions and fill out questionnaires, researchers don’t conduct any experiments and have no control over the participants.
An observational study is basically an exercise in statistics. Researchers try to find correlations between certain behaviours and certain outcomes. For example, do people who eat more vegetables have a larger or smaller risk of developing a certain disease?
Although the statistics from observational studies can show associations between certain behaviors and the development of a disease or condition, these associations may or may not be cause-effect relationships.4 In most cases, an observational study is not enough to be able to tell. An observational study can often just provide very weak evidence.5 A different kind of study, usually an experimental one, is needed to prove that something causes something else, for example that drinking coffee can make people lose weight.
There are good reasons for the famous quote stating that “there are three kinds of lies: lies, damned lies, and statistics.”
Observational studies – i.e. statistics – can in most cases just provide weak clues, and they can rarely prove anything. Of course, the study is not a lie – it’s just data. But the way these studies are often used in the media, to “prove” a thing one day and the exact opposite the next day, can bring that quote to mind.
What is an experimental study?
In a nutrition-related experimental study (also known as a clinical trial or interventional study), researchers provide participants with a diet, nutrition education, or other kind of intervention and evaluate its effects.
Experimental evidence is considered stronger than observational evidence. Randomized, controlled trials (RCTs) are often referred to as the “gold standard” for evidence. They are designed to test an intervention against a different intervention (i.e. low carb vs. low fat), or against a control group that does not change its behaviors (i.e low carb vs standard American diet), under tightly monitored conditions.
Assigning participants randomly to either the experimental or the control group helps to ensure that both groups are similar in ways that are not being tested (such as income, education, level of exercise, etc.). This makes these studies (in best case) a fair comparison, and makes the evidence they provide far stronger: often moderately strong evidence.
The best RCTs use the actual development of the disease being studied or death of the participant as the outcome being measured. Because medical conditions may take many years to develop, decades-long RCTs are very expensive, making them impractical in most cases. Therefore, many RCTs are much shorter, and instead of measuring health outcomes, they measure changes in health markers that reflect disease risk, such as changes in blood sugar, insulin, or inflammation levels.
Unfortunately, this assumes the changes in a surrogate marker reflect a positive or negative impact on one’s health. As we have seen in many studies, this may not always be the case.
The ultimate solution in the future – for very important questions regarding human diet and health – may be funding far fewer but much larger experimental studies. Studies that are large and long enough to measure real health outcomes. For example, studies that test a strict low-carb diet vs. a conventional diet on hundreds or thousands of people with diabetes type 2 for 5-10 years, and measure outcomes like disease and death. A high-quality study like this could easily cost hundreds of millions of dollars, yet the results could be worth orders of magnitude more for humanity.6
Systematic reviews and meta-analyses
A single study on its own is often not enough to provide clear answers about the relationship between food and health. Systematic reviews and meta-analyses are both ways of putting together multiple studies in an attempt to clarify what the evidence says.
A systematic review is a detailed, standardized process of gathering, assessing and synthesizing a collection of relevant studies on a particular topic.
A meta-analysis is a statistical procedure for combining data from the studies used in a systematic review.
Systematic reviews and meta-analyses may consist of observational research, experimental research, or a combination of both. They have historically been considered the strongest type of evidence; however, this is not always the case.
Systematic reviews and meta-analyses are sometimes seen as ways to “strengthen” the weak findings of observational studies. The thinking is that if a number of observational studies show the same effect, this must indicate a cause-effect relationship even if the effect is very small in all cases. But systematic reviews and meta-analyses made up of observational studies cannot override the fundamental principle that association is not causation. If you took a placebo pill that had no effect on a condition you wanted to treat, it wouldn’t work better if you took more of them! In the same way, weak observational studies do not develop rigor by combining many of them.
Systematic reviews and meta-analyses can be flawed in other ways as well, including cherry-picking studies for inclusion that support a researcher’s own position and excluding those that do not.7
Only RCTs (experimental studies) can come close to establishing that a certain food or way of eating causes a particular outcome. Systematic reviews and meta-analyses based on experimental studies have a much greater chance of providing good evidence on which to base decisions about your own health. We grade these as strong evidence.
Why many observational studies don’t provide convincing evidence
Observational studies can only give us information about how certain behaviors and diseases are associated or correlated. An association must be very strong in order to indicate a potential cause-effect relationship, and even strong associations do not necessarily show this. For example, skirt-wearing is strongly associated with the likelihood of developing breast cancer (since they are mostly worn by women!), but it would be silly to suggest that wearing a skirt causes breast cancer.
Typically, the strength of associations in observational studies about nutrition and chronic diseases is small, as reflected by the low relative risks that are found. A relative risk of 1.0 means there is no association. In most observational studies about nutrition, the relative risk is close to 1.0, with a range of 0.8 to 1.5, indicating a weak association.8 Weak associations are likely to be due to other factors such as random chance or confounding variables, and not likely to be a cause-effect relationship.
The reasons for such weak associations are often built into the design of observational studies. Because scientists are only observing a selected population, they cannot take into account all the possible factors that might affect how diet appears to be related to a disease.
For example, people who are concerned about their health are likely to choose foods they think will help prevent disease. But they are also more likely to do many other things they think will promote and protect their health, such as exercising regularly, avoiding smoking, and taking a multivitamin. It is hard to know which of these factors are responsible for outcomes found in an observational study.
Professor John Ioannidis is a highly-regarded expert in meta-research, the study of research practices and how to improve them. In September of 2018, he wrote an opinion piece for the Journal of the American Medical Association stating that nutrition observational studies are hopelessly flawed and in need of “radical reform.”9 In the article, he points out that hidden factors that may bias the outcomes of an observational study are not accounted for (for instance, people who eat a lot of meat may also drink a lot of beer and get little exercise) and that findings are routinely influenced by researcher bias.
He also points out the absurdity of claiming that certain foods will increase lifespan for a specific length of time. As an example, various studies show that consuming hazelnuts, coffee, oranges, and other foods and beverages on a daily basis may each help extend life by several years.
“If you were to gain all the benefit speculated by each one of these studies, we would be able to live for 5,000 years,” says Ioannidis.
When can the results of observational research be trusted?
Findings from observational studies usually need to be confirmed by higher-quality research, such as an experimental study, to be considered reliable. Unfortunately, this often do not happen. Instead, experimental trials in nutrition are likely to contradict the findings of earlier observational studies.10
In other words, findings from observational studies can usually not be trusted on their own.
However, there are occasional exceptions. Under exceptional and rare circumstances with very, very strong and plausible correlations (e.g. smoking and lung cancer), observational data can provide moderate evidence.11
Pros and cons of observational studies
- Are much less expensive than clinical trials
- Can last for several years or even decades
- Can include tens of thousands of study participants
- Can look at development of a disease or death as an outcome
- Rely on self-reported data that often can’t be confirmed
- Have no control group for comparison
- Can’t take into account all of the factors that can influence the results
- Are not cause-effect relationships
Pros and cons of experimental studies
- Are tightly controlled and monitored
- Compare outcomes between those who receive an intervention and those who don’t
- Can use randomization to deal with unknown factors that might influence outcomes
- Are expensive and time consuming
- Use health markers rather than development of disease or death as endpoints
- Are typically smaller than most observational studies
- Are usually shorter than most observational studies
Take home points
Observational research usually produces unreliable results, and these results are often given more attention in the media than they deserve.
Before changing your diet based on the most recent news story, find out a few things about the study being discussed. Is the study observational or experimental? Are the findings consistent with previous research, especially with higher-quality studies like experimental ones? If the study is observational, how strong were the associations between the outcome and the behavior, food, or diet being studied?
Most importantly, remember that observational studies usually can’t show that a specific food, diet or lifestyle caused a particular outcome. This normally requires an experimental study.
The bottom line is that most observational studies, and all the media headlines generated by them, can safely be ignored.
Adele Hite, RD
Adele came to rhetoric and communication from a Ph.D. program in nutritional epidemiology and a background in nutrition, dietetics, and public health. She’s animated by questions and concerns, many of which boil down to this: Why is nutrition [science, policy, discourse] the way it is?
Understanding absolute and relative risk
Guide Although it seems as if numbers should be objective and trustworthy, there are many ways that they can be used to distort the truth. Entire books have been written about this subject. Let’s take a look at the differences between absolute risk and relative risk.
For the full details about our evidence-grading policy, see this page:
A confounding variable is one that is not taken into consideration in the study. Confounding variables can introduce bias and indicate a relationship between a food or diet and a health outcome when there isn’t one. ↩
Though there are exceptions:
There is some discussion about what is considered a “weak” versus a “strong” association and how strong an association must be to potentially indicate a cause-effect relationship.
A helpful comparision is that relative risks found in assocations between smoking and lung cancer were around 10.0 for moderate smokers and 20.0 for heavy smokers. This level of relative risk was strong enough for experts to argue for a cause-effect relationship.
In the past few decades, there have many instances where the results of observational nutrition studies have been contradicted in RCTs.
For us to use this evidence grade, HR needs to be consistently > 5 in several high-quality observational studies, with biological plausibility, no other obvious explanation and generally following the classic Bradford Hill criteria.