Statistical Network Models

Recently I had the pleasure to hang out with Matt Silk from the University of Exeter:

It was great chatting about badgers, puma and network models over a few beers. His work summarizing statistical network models in Methods in Ecology and Evolution is particularly useful (see below).

After reading this you’ll know what ERGMS, TERGMs, REMs & SAOMs are and how they can answer network/disease questions. The only potentially useful addition I can see is generalized dissimilarity models (GDMs)  as a robust way to test for covariate effects on network structure (as I did in my JAE 2017 paper). Anyway,  this paper is certainly a good starting point for entering the world of statistical network analysis.



Methods in Ecology and Evolution goes microbial

Methods in Ecology and Evolution have put together a really exciting special issue on microbiome methods:

I’ve read a few of these papers already and there are certainly some really useful ideas and methods here. At a glance, Creer et al’s field ecologists guide to microbial ecology seems particularly useful – looking forward to reading in more depth!

NEON & insights from my first ESA

This year I was lucky enough to be awarded a NEON-ESA early career scholar award to help fund my first trip to ESA. I’ve been to large ecology conferences before, but I was particularly excited to expand my understanding of NEON (National Science Foundation’s National Ecological Observatory Network), meet some great ecologists and learn some new analytical tools. Still recovering from Jetlag (I had got too steamy New Orleans after 35 hours of travelling from Tasmania).

I was thrust into it at 8 am Sunday morning with a workshop on how to use generalized joint attribute modelling with Jim Clark. The flexibility of this tool and robust way it deals with messy community data makes it something I want to use on the microbiome data I’ve got coming in. For those interested, the vignette is super useful too:

Immediately following the GJAM workshop, we started a NEON focussed workshop on how to access and use NEON data. I was super impressed with just how integrated NEON is with R and how well documented the data is. I felt like you could get to know a particular location and precisely what data was collected there. From a disease ecology perspective, it is really exciting to have disease/microbiome data matched with extensive environmental data. The opportunities to ask continental-scale questions with fine resolution data are enormous. It was great to continue the discussion at a restaurant t after – NEON people are my type of people! Monday was another NEON-orientated day where we got to see what people have been doing with NEON data. I also got to meet Mike Kaspari which was great – I’ve been admiring his work for years.

The rest of my time at ESA was a haze of presenting my work on puma disease dynamics and going to as many disease ecology talks as possible. Two (and sometimes three) parallel disease ecology sessions were pretty neat. Our NSF puma project also had quite a few people presenting too – it was great to see all of this population genomic/disease ecology work coming together. Overall, it had been a huge week, but one that I hope will lead to exciting future collaborations!

Time-series modelling for ecologists

Recently I have been working on massive long-term group-group networks for both the Serengeti lions and Yellowstone wolves. We have tracked territory size, average pack/pride size, the number (and strength) of between pack/pride contacts every year from 1971-until today. Basically a series of time series in which we want to know which one is dependent on which. Not being particularly familiar with time series analysis I didn’t know where to start.

After doing a heap of reading I decided that vector auto-regression was the way to go. Vector autoregression (VAR) are stochastic process models that capture linear dependencies between multivariate time series. Mostly used for economic forecasting, the method seems pretty robust and quite straightforward to implement in the R package ‘vars’. However,  finding out all of the steps/assumptions required to run the model was tricky so here is my adapted code to fill the gap:

rm(list = ls())

############import data#######################

data1 <- read.csv(“Data.csv”, head=T)

############detrend with regression#######################

m1 <- lm(model~0+Year, data=data1) #lm with no intercept
m1resid <- residuals(m1)


#make a datframe again

dataResid <- cbind(m1resid)

############Vector Autoregression#######################

#make a ts object – Freq here is how many obs per year.
ts.obj <- ts(dataResid,frequency=1, start = 1997, end = 2016); str(ts.obj)
#test for the most appropriate lag for your data (eg., does a 2 year time lag best predict the next years connectivity.

VARselect(ts.obj, lag.max=3, type=”const”)$selection

#  ‘p’ below is the the lag factor to test.

varLag1 <- VAR(ts.obj, p=1, type=”const”) #p is is the lag factor

#testing normality (has to be ‘insignificant’ at alpha 0.05 to trust the results)

serial.test(varLag1,, type=”PT.asymptotic”)

arch.test(varLag1) #test for heteroskedasticity. Error terms are fine if p>0.05

roots(varLag1) #have to be under 1 to trust model results.

#extensive list of summary results.


#links nicely to the forcast package to predict the future

fcstL1 <- forecast(varLag1)
plot(fcstL1, xlab=”Year”)

My animal ecology blog post is out now

It was a great privilege to be highly commended for the Journal of Animal Ecology Elton Prize for outstanding papers by early career researchers. It also gave me an opportunity to write a blog about said paper which you can find here:

PhyloPic – great resource for animal silhouettes.

Adding animal silhouettes to figures seems to be increasingly on trend in ecology. I have no empirical evidence to back up this claim, but it seems like every article in a high impact journal has at least one figure that incorporates silhouettes of species.  I too am guilty of adding them – I find them a useful visual tool, but in the past, I’ve had to create them myself using photoshop. No more! PhyloPic ( provides an easy to search collection reusable silhouette images of organisms from beetles to dinosaurs.

Resources like this are truly great!

Co-occurence modelling and parasites

It’s increasingly recognized that multiparasitism (being infected by multiple parasites at the same time) is commonplace and what particular set of parasites you are infected with can have direct implications for health (and are interesting in their own right). However, quantifying the complex interactions between co-occurring parasites is tricky. For example, are the co-occurrence of particular parasites just related to age i.e. as you get older you simply accrue more infection? Or are the parasites (via the immune system) facilitating (or prohibiting) the invasion of others or is it another reason entirely? Answering these questions is important but choosing the appropriate analytical solution is a little daunting. Species co-occurrence patterns have been studied of other organisms for a long time so there are many approaches.

So what are the options? Broadly, I recognize three distinct approaches: 1. Network-based models. 2. Probabilistic models and 3. Joint species distribution models. Each I will talk a little bit about and point out briefly some pros and cons about each approach. See the resources below for links to some of the methods/papers that use the method.

Network-based models.

Co-occurrence networks are networks of pathogens connected by edges (the connecting lines) which represent when those particular infections were sampled together. These methods look at the network structure by, for example, examining how connected certain pathogens are (i.e. degree) or by assessing which pathogens in the network cluster together more often than expected by chance (i.e. how modular the network is). Pros: relatively straightforward to analyze, a good way to view co-infection patterns (iGraph in R is great), not restricted to assessing just pairs of pathogens. Cons: difficult to overlay potentially confounding factors (e.g., age, but see the new and exciting MRFcov package from Nick Clark), hard to test for associations between pathogens across scales & difficult to incorporate trait or phylogenetic information.

Null and probabilistic models 

Basically, these methods ask do two species co-occur more or less often by chance. There is a large number of methods in this category and much debate to how robust these methods are (see Gotelli 2000), but the Veech 2013 method is my favorite as its distribution free. Pros: Easy to interpret, fast to run with low error rates. Cons: Can only assess pairs (exception: the screening approach of Vaumorin but you have to have < 10 pathogens) and can’t control for confounding effects or test for associations between pathogens across scales, null models can have extreme Type I errors (see Harris, 2016)  .

Joint distribution modeling

The last category and one I have used the most! Basically, this method quantifies the distribution of each parasite in your data to environmental (and host) variables using Bayesian hierarchical mixed modeling and then explores between-parasite relationships in the residual variation.  There are nice packages in R to help you apply this approach (BORAL and HMSC are my favorites). Pros:  Enables you to assess co-occurrence patterns after controlling for confounding factors and to assess these patterns easily across scales, they are flexible and can deal with parasite abundance data (i.e more than just presence/absence of a parasite) & you get useful niche models as a bonus. Also can easily incorporate parasite phylogenetic and functional trait data. Cons: an only assess pairs, &and it doesn’t provide coefficients for the strength of the co-occurrence patterns (just significantly different from zero).


Elise Vaumourin has a nice review article:

Network approaches– Modularity algorithm: Igraph:

Interesting paper:

The Nick Clark MRFCov approach:

Harris, 2016 for Markov networks:

Probabilistic models – The Veech paper: Vaumourin et al (2014)

Joint distribution modeling



Cool papers using the approach:


Exciting Animal Ecology issue

The new Journal of Animal Ecology special issue focuses on animal host-microbe interactions (often in a disease context) looks like a must read. All the articles look interesting but there a few which particularly stand out . Most I’ve seen in preprint form but it is nice to see them all together. In no particular order:

Mihaljevic et al on parasite metacommunities – this looks like an interesting technique!

Keiser et al on queen presence and disease – ants are always interesting.

Raulo et al on social behaviour and gut microbiota.

Becker et al on resource provisioning and host traits in detrmining host-parasite interactions.

Looking forward to reading these articles and the others in more detail!


Social systems and disease: Canine distemper in Yellowstone and the Serengeti

Recently I just got back from a really interesting meeting to link the disease work that Craig Packer, Meggan Craft and I have been doing as part of the Serengeti Lion Project to the Yellowstone Wolf Project. The meeting this time was in Yellowstone and it was a brilliant opportunity to see wolves in the wild (and the park in winter, see moose image below – my wolf photos weren’t great) . I now know much more about wolf biology and the effort taken to understand this charismatic species. Really a tremendous experience.

These two systems represent the most intensively studied social carnivore systems in the world and the opportunities to compare and contrast the disease ecology across systems is exciting. You wouldn’t necessarily think that the diseases infecting canids and felids would be similar, but in both systems one of the important pathogens is canine distemper virus (CDV). CDV can lead to serious reductions in numbers of both species, but perhaps due to the social organization,lions and wolves numerically recover quickly. CDV exposure , particularly for lions, is nasty and individuals can experience severe neurological symptoms.  It is unknown, however, if prides/packs impacted by CDV alter their interactions with other groups. This in turn could alter how diseases move around both landscapes by reducing inter group interactions for the years proceeding an outbreak. The hypothesis is that groups weakened by the disease are less likely to fight over territory and thus become more timid for a period post epeidemic. Our initial results show that this maybe the case for lions at least with the number of pride-pride contacts diminishing in the 4 year period after the epidemic. The lion population had largely recovered by then but the effects of CDV epidemics look like they linger. The plan is to now see if the same pattern can be found with the wolves and this could be the first compelling case for the power of epidemics to cause social disruption.

This collaboration was only really possible due to Pete Hudson, and I’m amazed that he was able to turn a conversation we had when he visited UMN last year into a collaboration that has the potential to understand sociality and disease in a new light.


Tip Dating in BEAST

Understanding how robust the molecular clock is a critical step for many evolutionary analyses. Usually when I get given a set of aligned sequences I turn to TempEst to test the ‘clocklikeness’ of the data. However, after the release of ‘TIPDATINGBEAST’ (I’ll call it TDBEAST fr short) I may turn to TempEst much less often. Whist TempEst is useful in getting some qualitative idea of the temporal signal in the data but I find it annoying that its hard to drill down to find what sequences sequences that are leading to bias.  Furthermore, TempEst  is sensitive to the input tree which can lead to problems and I have had issues with guessDates as well which is slightly irritating. TDBEAST solves lots of these issues and more and provides a robust method to test how ‘clock-like’ your sequences are. The TDBEAST R package does 2 things using BEAST log files and .xml files:

1. Uses a date randomization test to measure temporal signal and provides a nice visualization  to check results.

2. Uses a leave one out cross validation to work out the likely culprits that could be skewing results. These sequences, for example, could have the wrong date assigned to them by mistake – this is a real problem when working on sequences from the field.

Whilst I’m having a few teething issues with the code, overall the ‘how to’ guide is excellent and the code seems straight forward. TDBEAST definitely seems like a valuable addition to my phylogenetic tool box.

Here are the links: