Understanding how the vast collection of organisms within us (‘the ‘microbiome’) is linked to human (and ecosystem) health is one of the most exciting scientific topics today. It really does have the possibility of improving our lives considerably though is often over-hyped (see the link below). However, I’ve recently I’ve been reading quite a few microbiome papers (it was our journal clubs topic of the month) and have been struck by the poor study design and lack of understanding of the statistical methodology. Talking to colleagues in the microbiome field – these problems maybe more widespread and could be hindering our progress in understanding this important component of the ecosystem within us.
Of course microbiome research is simply microbe community ecology, but the way some microbiome practitioners use and report community ecology statistics is problematic and sometimes outright deceptive.This includes people publishing in the highest scientific journals. I won’t pick on any particular paper, but here are a few general observations (sorry for the technical detail).
- Effect sizes are often not reported or visualized using ordination techniques. They have a significant P value but how do you know how biologically relevant this is? My guess is that they are small as in often the case with free living communities.
- Little detail is given about how the particular test is performed. Usual example: “We did a PERMANOVA to test for XX”. Despite the fact that the PERMANOVA has some general issues (see the Warton et al paper below), no information is given about the test anyway e.g., was it a two way crossed design, did they use Type III sums of squares etc? Did they test for multivariate disperson using PERMDISP or similar? Literally that is one of the only assumptions of the test but I haven’t read any microbiome paper that has checked. If they haven’t we can’t trust the results. Have they read the original paper by Marti Anderson? Some cite it at least….
- I haven’t found any PCA or PCoA plot with % of variance explained. This is annoying – the axes shown may only explain a small amount of variance in the community, so thus the pretty little clusters of points shown maybe pretty artificial.
I’ll stop ranting. These issues really impair interpretation of the results and make the science difficult to replicate. It makes you ask “how do these papers get through the gates?’ I’m guessing that a significant proportion of authors, reviewers and editors have little experience in community biostats and don’t really understand what the tests are doing. They are relying on analytical pipelines such as QUIIME that claim to ‘ publication quality graphics and statistics’ and not thinking much more about it. More microbiome researchers need to go beyond these pipelines and keep up-to-date with community methods more broadly. The quality of the research will clearly improve.
Marti Anderson’s paper: http://onlinelibrary.wiley.com/doi/10.1111/j.1442-9993.2001.01070.pp.x/full