Co-occurence modelling and parasites

It’s increasingly recognized that multiparasitism (being infected by multiple parasites at the same time) is commonplace and what particular set of parasites you are infected with can have direct implications for health (and are interesting in their own right). However, quantifying the complex interactions between co-occurring parasites is tricky. For example, are the co-occurrence of particular parasites just related to age i.e. as you get older you simply accrue more infection? Or are the parasites (via the immune system) facilitating (or prohibiting) the invasion of others or is it another reason entirely? Answering these questions is important but choosing the appropriate analytical solution is a little daunting. Species co-occurrence patterns have been studied of other organisms for a long time so there are many approaches.

So what are the options? Broadly, I recognize three distinct approaches: 1. Network-based models. 2. Probabilistic models and 3. Joint species distribution models. Each I will talk a little bit about and point out briefly some pros and cons about each approach. See the resources below for links to some of the methods/papers that use the method.

Network-based models.

Co-occurrence networks are networks of pathogens connected by edges (the connecting lines) which represent when those particular infections were sampled together. These methods look at the network structure by, for example, examining how connected certain pathogens are (i.e. degree) or by assessing which pathogens in the network cluster together more often than expected by chance (i.e. how modular the network is). Pros: relatively straightforward to analyze, a good way to view co-infection patterns (iGraph in R is great), not restricted to assessing just pairs of pathogens. Cons: difficult to overlay potentially confounding factors (e.g., age, but see the new and exciting MRFcov package from Nick Clark), hard to test for associations between pathogens across scales & difficult to incorporate trait or phylogenetic information.

Null and probabilistic models 

Basically, these methods ask do two species co-occur more or less often by chance. There is a large number of methods in this category and much debate to how robust these methods are (see Gotelli 2000), but the Veech 2013 method is my favorite as its distribution free. Pros: Easy to interpret, fast to run with low error rates. Cons: Can only assess pairs (exception: the screening approach of Vaumorin but you have to have < 10 pathogens) and can’t control for confounding effects or test for associations between pathogens across scales, null models can have extreme Type I errors (see Harris, 2016)  .

Joint distribution modeling

The last category and one I have used the most! Basically, this method quantifies the distribution of each parasite in your data to environmental (and host) variables using Bayesian hierarchical mixed modeling and then explores between-parasite relationships in the residual variation.  There are nice packages in R to help you apply this approach (BORAL and HMSC are my favorites). Pros:  Enables you to assess co-occurrence patterns after controlling for confounding factors and to assess these patterns easily across scales, they are flexible and can deal with parasite abundance data (i.e more than just presence/absence of a parasite) & you get useful niche models as a bonus. Also can easily incorporate parasite phylogenetic and functional trait data. Cons: an only assess pairs, &and it doesn’t provide coefficients for the strength of the co-occurrence patterns (just significantly different from zero).


Elise Vaumourin has a nice review article:

Network approaches– Modularity algorithm: Igraph:

Interesting paper:

The Nick Clark MRFCov approach:

Harris, 2016 for Markov networks:

Probabilistic models – The Veech paper: Vaumourin et al (2014)

Joint distribution modeling



Cool papers using the approach: