Are ecologists using fancy statistics in situations where good old t tests, ANOVAs or regressions would do? Is this statistical ‘machismo’ (sensu Brian McGill) making ecologists worse at designing experiments/ observational studies? These questions have been topical at the moment with a few interesting ecology blog articles (and associated comments – see the links below). Now don’t get me wrong I have no problems at all with doing frequentist t tests etc when the data used fits the assumptions of the tests. However, even in my experimental work the data generated doesn’t come close to meeting the assumptions of theses ‘simple’ tests. I wasn’t expecting them to either – I’m a community ecologist and species abundance data is always going to zero inflated etc.

Moreover, even if I could apply these simple tests, these tests alone may miss some really interesting parts of the ecological story. For example, the fact that my multiple response variables (species) are correlated with each other is actually really interesting and helps us understand how this experimental community assembles and forms the basis of co-occurrence theory. Similarly observational studies are often unavoidably hierarchical and suffer from a degree of spatial auto-correlation that should be accounted for. Statistics have come a long way since Fisher in 1918 and ecologists and evolutionary biologists have been amongst the leaders of field for good reasons. We can now ask ecological questions that were even impossible to answer 20 years ago with modern statistical modelling. Similarly, evolutionary biology has had also had an incredible rise in usage of more complex evolutionary models – but this is enabling us to understand evolution in a more realistic way. From a disease perspective, these advances in evolutionary models, for example, have enabled me to understand virus spread in pretty high resolution. Ideally, we should be celebrating these statistical advances rather than lamenting them.

Nonetheless I admit that there is a problem with papers becoming more difficult to read and this is linked to the rise of statistical ‘machismo’. I envisage a two pronged approach to practically dealing with this. As mentioned in the comments sections of both the articles below statistical education for ecologists would help. For me I only really got an introduction to frequentist stats in the last year of my undergraduate degree, and limited formal training during my PhD. Ideally, statistics should be incorporated in the curriculum much earlier. I an ideal world basic frequentist statistics should be mandatory for the first year of an ecology degree, GLMMs/multivariate analysis introduced second year and Bayesian/machine learning methods in the last years. Stating the obvious, statistical education should be more strongly linked to the statistics most commonly employed by ecologists.

Secondly, papers using complicated methods should be strongly encouraged to provide a few sentences at least in the methods outlining why the method employed is justifiable, and in basic terms, how the methods works (and what the limitations of the method are). This hopefully would limit people from using unnecessarily complex designs. In my experience when I have employed complex methods reviewers have rightly demanded this of me. I think people applying a technique without understanding what is actually going on ‘under the hood’ and what to look out for is a separate problem – there is really no excuse for this. Email the person that created the R package if you need some assistance. These folks are usually pretty obliging and sometimes can catch inappropriate usage. Obviously it is in their best interest for the package to be employed and used appropriately.

Blog posts are here: https://dynamicecology.wordpress.com/2017/11/07/ask-us-anything-are-statistics-in-ecology-papers-becoming-too-difficult-for-students-and-readers-to-understand/

https://scientistseessquirrel.wordpress.com/2017/11/06/statistics-in-excel-and-when-a-results-section-is-too-short/