Infographic on supplements, placebo comparison, and statistical significance, showing why small differences may not mean real benefits.

Manipulating the Conclusion in Supplement Research: “Does It Work, or Not”? A General Public Guide.

Truth Full Health

In an earlier post, we covered the basics of how to read scientific studies to understand whether a supplement—or the ingredients inside it—actually work.

Today, we take a closer look at one of the most commonly misunderstood parts of research: statistical significance and p-values.

These concepts help us understand whether a reported finding is likely to be real or simply due to chance.

Understanding this will help us cut through the noise of many p-values and focus on what truly matters when deciding whether a supplement has meaningful evidence behind it.

 

What Are “Statistical Significance” and “P-values”?

Statistical significance is a technical way of saying that when we compare two or more groups, results are likely reflecting a real difference rather than random chance.

To determine this, researchers conduct statistical tests, which produces p-values.

A p-value is a number that we compare to a cutoff to figure out whether the findings might simply be due to chance (Table 1).

 

Table 1. Interpreting P-values in Scientific Research

Scenario

Interpretation

How Studies May Report

p-value above the cutoff

There is no difference between the groups.

·       “Not statistically significant”

·       “Not significantly different

·       “No significant difference”

·       “No benefit / harm”

p-value at or below the cutoff

There is a difference between the groups.

·       “Statistically significant”

·       “Significantly different”

·       “Significant differences”

·       “Benefit / harm observed”

 

In many fields of health and nutrition research, a p-value less than 0.05 is widely used as the cutoff for statistical significance.[1]

This roughly means:

  • There is less than a five percent (5%) chance that any benefits (or harm) found are due to random chance, and
  • There is more than a ninety-five percent (95%) chance that any benefits (or harm) found are not due to random chance

We might see this cutoff reported as:

  • p < 0.05
  • p < .05
  • p < 5%
  • An exact p-value (such as p=0.031)
  • Sometimes studies report smaller p-values, such as p < 0.01 or p < 0.001. These still mean the p-value is below the usual cutoff of 0.05.

 

Which Comparisons Actually Matter?

Understanding p-values is only half of the story.

The more important step for consumers is knowing which comparisons are meaningful for deciding whether a supplement works.

In supplement studies, researchers often report many p-values, because each one comes from a different comparison. These may involve:

  • Comparing results between different groups (for example, supplement vs. placebo or one supplement vs. another)
  • Comparing changes within the same group (for example, before vs. after taking a supplement)

In gold standard human studies like randomized controlled trials (RCTs), what matters most is the comparison between groups.

This is because comparing between groups helps account for other things that can happen while people are taking the supplement or placebo—like changes in diet, lifestyle, placebo effects, and more.

A common form of data manipulation (intentional or not) is when studies report:

  • The supplement group showed a statistically significant improvement on its own.
  • But the improvement was no greater than in the comparison group.

These results can seem confusing, which makes them easy for people to misunderstand or misinterpret.

When this happens, the correct conclusion is that the supplement did not do any better than the comparison group.

Let’s walk through two real examples to illustrate this.

 

Real-life Example 1: Energy-Reduced Diet Study (Isomaltulose vs. Sucrose)[2]

Who and what was studied:

  • 50 healthy, overweight or obese adults followed a reduced-calorie diet for 12 weeks
  • One group consumed diet containing isomaltulose (ISO), the other consumed sucrose
  • Key markers included changes in weight and fat mass

What the authors claimed:

  • Both groups had weight change, and those changes were statistically significant.
    • A “more pronounced” weight change in the ISO group compared to the sucrose group
  • Fat mass changed significantly within the ISO group
    • No mention of comparisons between the ISO and sucrose group in the abstract
  • Some metabolic measures favored ISO

What the p-values actually showed:

1. Body weight

  • Indeed, both groups had weight change, with statistically significant changes in each group (p < 0.001)
  • However, the p-value comparing the weight change between the two groups is 0.26, a 26 percent (26%) chance that the difference is due to chance, way above the p = 0.05 threshold
  • The correct conclusion: no difference in weight change between groups

2. Fat mass

  • Indeed, fat mass changed significantly within the ISO group (p = 0.005)
  • However, when comparing between the ISO and sucrose groups, there is no difference in fat mass (p = 0.169).
  • The correct conclusion: ISO did not lead to more fat-mass change than sucrose.

3. Metabolic measures

  • Two metabolism-related measures were different between the ISO and sucrose groups, and these differences were statistically significant (p < 0.05)[3]
  • The authors’ claims about these metabolic measures are accurate
  • However, these results only apply to those two tests – not to weight or fat-mass change

4. Problematic conclusion

The authors suggested that ISO “may support” better weight management than sucrose.

However, the two key outcomes for weight management—weight loss and fat mass change —did not differ between groups.

Highlighting non-significant findings as if they support a benefit can mislead readers.

Without statistically significant differences between groups, the study does not support ISO as superior for weight or fat loss.

 

What if the study only looked at the ISO group?

If this study had only looked at the ISO group by itself, the significant change within that group might have suggested that ISO helps with healthy weight management.

But without comparing ISO to another group, we wouldn’t know whether the same changes would have happened anyway — or whether another option (like sucrose) would work just as well.

 

Real Life Example 2: Probiotic Supplement Study in Type 2 Diabetes[4]

Who and what was studied:

  • Adults with type 2 diabetes not on medication received either a probiotic supplement or placebo for 13 weeks
  • Multiple metabolic markers were evaluated

 

Findings:

1. Endotoxin levels[5]

  • No statistically significant difference between groups (p = 0.15)
  • This means the supplement did not demonstrate a measurable effect on endotoxin levels compared to placebo

2. Other metabolic markers

  • On some metabolism-related tests, the supplement group did better than the placebo group
  • For example, waist-to-hip ratio (WHR) and HOMA-IR[6] showed statistically significant differences between the groups

 

Why this study’s interpretation was appropriate:

  • The authors emphasized the between-group differences
  • They did not overstate the non-significant outcomes
  • Their conclusions were aligned with the actual statistical results

 

Final Takeaways

✔ Favor evidence from gold-standard human studies (such as RCTs)

  • These studies are stronger than single-group studies because they reduce bias from other changes that can happen over time—such as changes in diet, lifestyle, or expectations—that might otherwise be mistaken for a supplement effect

✔ Focus on between-group comparisons

  • Within-group results can be interesting, but they cannot tell you whether a supplement works better than another option or a placebo

✔ The p = 0.05 cutoff can serve as a quick reference point when reviewing study results, helping you sort through information more efficiently

  • While not perfect, it is a long-standing standard in clinical research for determining whether an observed effect is likely to be real.
  • Be cautious of claims that downplay or dismiss the 0.05 standard, which has been used and supported by clinical and statistical experts for decades.

✔ Beware of “noise”

  • Selective reporting, emphasizing only within-group results, or overstating non-significant differences can create misleading impressions.

✔ If a between-group comparison is statistically significant, it suggests a real difference—whether helpful or harmful—within the context of that study

 

Our goal is to help empower you to recognize when results truly support a supplement’s effect and when the data are being stretched beyond what they can actually show.

Good luck, and stay informed!

 

Sincerely,

Derek Tang, PhD, MS, BSPharm

Truth Full Health

Your Trusted Supplement Partner

 

*Disclaimer: all blogged content is for informational purposes only and does not replace professional medical advice. The statements made regarding dietary supplements (vitamins and supplements) have not been evaluated by the Food and Drug Administration (FDA). These products are not intended to diagnose, treat, cure, or prevent any disease. Always consult with a qualified healthcare provider before beginning any new supplement, diet, or health regimen. Any references to specific products or studies are for illustrative purposes and do not constitute an endorsement, approval, or support, nor do they imply disapproval or rejection.

 

[1] Title: Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing. First author: L. Kennedy-Shaffer. Journal: The American Statistician. Year of publication: 2019.

[2] Title: Changes in Weight and Substrate Oxidation in Overweight Adults Following Isomaltulose Intake During a 12-Week Weight Loss Intervention: A Randomized, Double-Blind, Controlled Trial. First author: H. Lightowler. Journal: Nutrients. Year of publication: 2019

[3] The two measures were postprandial respiratory quotient (how much the body burns carbs vs. fats after a meal) and energy intake (the calories eaten during the test meal)

[4] Title: Effects of a multi-strain probiotic supplement for 12 weeks in circulating endotoxin levels and cardiometabolic profiles of medication naïve T2DM patients: a randomized clinical trial. First author: S. Sabico. Journal: Journal of Translational Medicine. Year of publication: 2017

[5] A measure of harmful substances from certain bacteria in the gut that leak into the bloodstream

[6] WHR: A measurement comparing your waist size to your hip size to estimate where you store fat. HOMA-IR: A score that shows how well your body responds to insulin — basically, how insulin-resistant you are.

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.