Integral projection models and infectious disease

Population evolutionary ecologists are increasingly turning to integral projection models to understand how changes performance (e.g., growth) influence population dynamics, but this type of modelling is rarely applied to understand host-parasite feed-backs. After being introduced to this modelling approach by Tim Coulson and Shelly Lachish, I’ve been thinking about how they could be applied to disease ecology. I’m not the first one to do so and there is an great review by Metcalf et al (see link below) on the topic. The technique appeals to me it’s quantitative data-driven approach to understanding host-pathogen dynamics that can account for variation at a within host, individual and population scale. Recently, Bayesian IPMs have been developed and these offer further advantages (see, but may be more time consuming to construct.

It’s not surprising however, that these models haven’t really taken off in the field yet. One obvious reason for this could be due to high number of parameters necessary to run the model – although you can use this IPMs in a theoretical context also.  For most wildlife systems detailed individual longitudinal data on things such as  parasite load over the term of infection is near impossible to get. I wonder if new molecular tools (e.g., measuring viral load using RT PCR from feces) may help fill this data gap? Currently it looks like you extensive lab experiments before you can really use this approach (see Wilber et al below). Anyway,  I’m looking forward to learning more the next time we meet with Tim and Shelly.

Mecalf et al (2015):

Wilber et al:


From Metcalf et al (2015).


Our eco-phylogenetic review is out now!

Curious about how phylogenetic community ecology can be applied to understand infectious disease? Our review is out now in Biological Reviews:

This all started thanks to a small grant from the University of Minnesota Institute on the Environment enabling me to get a great group of people together to talk disease ecology and phylogenetics. It’s now great to see it out there! It was also great working with a graphic designer to help get the figures that bit more appealing. I can highly recommend Elissa ( – I learned a lot from her about getting the visual aspects of figures more refined. Figure 1 is below….

Figure 1 review

Fig. 1. Conceptual schema illustrating how an eco-phylogenetic framework can be applied to understand infectious disease dynamics. The example system used is the Ngoronogoro Crater (Tanzania), across scales: (A) within host; (B) among hosts of the same species; (C) multi-host complex; and (D) landscape scale. Colour-coded and lettered symbols below each panel indicate what data (squares) and statistical tools (circles) could be used to address each challenge (see Section I.2 for model and other tool details). White ovals contain hypothetical parasite communities within a host and different parasite colours and shapes (nematodes or viruses) represent different parasite species or genotypes. PGLMM: phylogenetic generalised linear mixed model.

Defending (to a degree) statistical ‘machismo’

Are ecologists using fancy statistics in situations where good old t tests, ANOVAs or regressions would do? Is this statistical ‘machismo’ (sensu Brian McGill) making ecologists worse at designing experiments/ observational studies? These questions have been topical at the moment with a few interesting ecology blog articles (and associated comments – see the links below). Now don’t get me wrong I have no problems at all with doing frequentist t tests etc when the data used fits the assumptions of the tests. However, even in my experimental work the data generated doesn’t come close to meeting the assumptions of theses ‘simple’ tests. I wasn’t expecting them to either – I’m a community ecologist and species abundance data is always going to zero inflated etc.

Moreover, even if I could apply these simple tests, these tests alone may miss some really interesting parts of the ecological story. For example, the fact that my multiple response variables (species) are correlated with each other is actually really interesting and helps us understand how this experimental community assembles and forms the basis of co-occurrence theory.  Similarly observational studies are often unavoidably hierarchical  and suffer from a degree of spatial auto-correlation that should be accounted for. Statistics have come a long way since Fisher in 1918 and ecologists and evolutionary biologists have been amongst the leaders of field for good reasons. We can now ask ecological questions that were even impossible to answer 20 years ago with modern statistical modelling. Similarly, evolutionary biology has had also had an incredible rise in usage of more complex evolutionary models – but this is enabling us to understand evolution in a more realistic way. From a disease perspective, these advances in evolutionary models, for example,  have enabled me to understand virus spread in pretty high resolution. Ideally, we should be celebrating these statistical advances rather than lamenting them.

Nonetheless I admit that there is a problem with papers becoming more difficult to read and this is linked to the rise of statistical ‘machismo’. I envisage  a two pronged approach to practically dealing with this. As mentioned in the comments sections of both the articles below statistical education for ecologists would help. For me I only really got an introduction to frequentist stats in the last year of my undergraduate degree, and limited formal training during my PhD. Ideally,  statistics should be incorporated in the curriculum much earlier. I an ideal world basic frequentist statistics  should be mandatory for the first year of an ecology degree, GLMMs/multivariate analysis introduced second year and Bayesian/machine learning methods in the last years. Stating the obvious, statistical education should be more strongly linked to the statistics most commonly employed by ecologists.

Secondly, papers using complicated methods should be strongly encouraged to provide a few sentences at least in the methods outlining why the method employed is justifiable, and in basic terms, how the methods works (and what the limitations of the method are). This hopefully would limit people from using unnecessarily complex designs. In my experience when I have employed complex methods reviewers have rightly demanded this of me. I think people applying a technique without understanding what is actually going on ‘under the hood’ and what to look out for is a separate problem  – there is really no excuse for this. Email the person that created the R package if you need some assistance. These folks are usually pretty obliging and sometimes can catch inappropriate usage. Obviously it is in their best interest for the package to be employed and used appropriately.

Blog posts are here:

Integrating networks and phylogenies

Considering the broad similarities between networks and phylogenies  it is amazing that they have, up until recently,  been very separate approaches. In the world of epidemiology transmission trees have been gaining momentum over the last 5 years (see the excellent review by Hall et al: as they turn phylogenies into something that more-or-less equates to transmission. Now it appears that ecologists are doing the same thing with this really interesting paper just out in Methods in Ecology and Evolution (see link below). The package attached to Schliep et al looks really cool and I can imagine will be of use to a broad array of disciplines. I’m looking forward to trying it out my self…..

Here is the link:

FIV in the Seregeti lions :Our paper is our now in JAE.

jane12751-toc-0001Pathogen subtype really does matter – different subtypes of FIV (feline HIV) get around the Serengeti lions is remarkably different ways. This was the general conclusion from our paper just out in the Journal of  Animal Ecology (see link below). After many years work I’m thrilled that this paper is out. This paper hopefully highlights some of the ways in which cool community phylogenetic methods (coupled with phylodynamic approaches) can help understand disease transmission in  a wild population.

Here is a link:


Parasite meta-communities

Studies applying metacommunity concepts to understanding parasite and symbiotic communities are still pretty rare. That’s what make a new paper by Mihalijevic et al in Journal of Animal Ecology that much more exciting. Aside from the impressive data sets assembled, I particularly like how they  use multi-species occupancy models with detection error estimates built in. I agree with the authors that this is particularly useful for parasites. I also like how they estimated how well their models predicted out of sample data – I’ve never seen this in occupancy models before. I did think they interchanged ‘symbiont’ and ‘parasite’ in a confusing way to me at least – but that’s just a minor quibble. Overall it was interesting that host richness and identity were important in explaining parasite composition – this is logical but rarely (if at all?) demonstrated. I think these approaches really are of value for disease ecology and hopefully are used more broadly in the future.

Here is the link:


Great resource for implementing different substitution models in BEAST

Don’t you just hate when you run J model test (or similar software) to find the most parsimonious substitution model for a given set of sequences and the best model is something obscure and often not directly implementable in phylogenetics platforms like BEAST?

I stumbled across this excellent post by Justin Bagley that provides really useful information on how to put all sorts of substitution models in BEAST:

This is definitely a valuable resource and makes incorporating J model test results much easier.

Genomics of Wildlife Disease Workshop 2017

A couple of weeks ago I had the pleasure to attend the disease genomics boot camp (more formally the genomics of wildlife disease workshop) at Colorado State with a great bunch  of people including members of the Craft lab (see below).  It was the first time the workshops been held but overall it was a success and I can highly recommend it to others interested in the topic.

It really was a broad (and nearly overwhelming) overview of the entire next-gen process from getting sequences from Ilumina runs to a variety of downstream analytical approaches. There was also a a lot of material incorporating the host genome too which I thought was particularly useful. As our NSF project was responsible for the workshop I was an ‘auditor’ and assisted with the BEAST afternoon. Not only was the course material (and the lecturers) good but the guest speakers were excellent and help to frame things really well.  It was also a great opportunity to network with like-minded researchers and was a nice to chat with the other NSF postdocs/phD students all things puma (and bobcat) disease.

Now to get ready for EEID (Ecology and Evolution of Infectious Disease) 2017 in Santa Barbara…..


Misunderstanding the Microbiome: misuse of community ecology tools to understand microbial communities

Understanding how the vast collection of organisms within us (‘the ‘microbiome’) is linked to human (and ecosystem) health is one of the most exciting scientific topics today.  It really does have the possibility of improving our lives considerably though is often over-hyped (see the link below). However, I’ve recently I’ve been reading quite a few microbiome papers (it was our journal clubs topic of the month) and have been struck by the poor study design and lack of understanding of the statistical methodology. Talking to colleagues in the microbiome field – these problems maybe more widespread and could be hindering our progress in understanding this important component of the ecosystem within us.

Of course microbiome research is simply microbe community ecology, but the way some microbiome practitioners use and report community ecology statistics is problematic and sometimes outright deceptive.This includes people publishing in the highest scientific journals. I won’t pick on any particular paper, but here are a few general observations (sorry for the technical detail).

  1. Effect sizes are often not reported or visualized using ordination techniques. They have a significant P value but how do you know how biologically relevant this is?  My guess is that they are small as in often the case with free living communities.
  2. Little detail is given about how the particular test is performed. Usual example: “We did a PERMANOVA  to test for XX”. Despite the fact that the PERMANOVA has some general issues (see the Warton et al paper below), no information is given about the test anyway e.g., was it a two way crossed design, did they use Type III sums of squares etc? Did they test for multivariate disperson using PERMDISP or similar? Literally that is one of the only assumptions of the test but I haven’t read any microbiome paper that has checked. If they haven’t we can’t trust the results. Have they read the original paper by Marti Anderson? Some cite it at least….
  3. I haven’t found any PCA or PCoA plot with % of variance explained. This is annoying – the axes shown may only explain a small amount of variance in the community,  so thus the pretty little clusters of points shown maybe pretty artificial.

I’ll stop ranting. These issues really impair interpretation of the results and make the science difficult to replicate. It makes you ask “how do these papers get through the gates?’ I’m guessing that a significant proportion of authors, reviewers and editors have little experience in  community biostats and don’t really understand what the tests are doing. They are relying on analytical pipelines such as QUIIME that claim to ‘ publication quality graphics and statistics’ and not thinking much more about it. More microbiome researchers need to go beyond these pipelines and keep up-to-date with community methods more broadly. The quality of the research will clearly improve.


Microbiome over-hype:

Warton et al:

Marti Anderson’s paper: