NEON & insights from my first ESA

This year I was lucky enough to be awarded a NEON-ESA early career scholar award to help fund my first trip to ESA. I’ve been to large ecology conferences before, but I was particularly excited to expand my understanding of NEON (National Science Foundation’s National Ecological Observatory Network), meet some great ecologists and learn some new analytical tools. Still recovering from Jetlag (I had got too steamy New Orleans after 35 hours of travelling from Tasmania).

I was thrust into it at 8 am Sunday morning with a workshop on how to use generalized joint attribute modelling with Jim Clark. The flexibility of this tool and robust way it deals with messy community data makes it something I want to use on the microbiome data I’ve got coming in. For those interested, the vignette is super useful too:

Immediately following the GJAM workshop, we started a NEON focussed workshop on how to access and use NEON data. I was super impressed with just how integrated NEON is with R and how well documented the data is. I felt like you could get to know a particular location and precisely what data was collected there. From a disease ecology perspective, it is really exciting to have disease/microbiome data matched with extensive environmental data. The opportunities to ask continental-scale questions with fine resolution data are enormous. It was great to continue the discussion at a restaurant t after – NEON people are my type of people! Monday was another NEON-orientated day where we got to see what people have been doing with NEON data. I also got to meet Mike Kaspari which was great – I’ve been admiring his work for years.

The rest of my time at ESA was a haze of presenting my work on puma disease dynamics and going to as many disease ecology talks as possible. Two (and sometimes three) parallel disease ecology sessions were pretty neat. Our NSF puma project also had quite a few people presenting too – it was great to see all of this population genomic/disease ecology work coming together. Overall, it had been a huge week, but one that I hope will lead to exciting future collaborations!

Time-series modelling for ecologists

Recently I have been working on massive long-term group-group networks for both the Serengeti lions and Yellowstone wolves. We have tracked territory size, average pack/pride size, the number (and strength) of between pack/pride contacts every year from 1971-until today. Basically a series of time series in which we want to know which one is dependent on which. Not being particularly familiar with time series analysis I didn’t know where to start.

After doing a heap of reading I decided that vector auto-regression was the way to go. Vector autoregression (VAR) are stochastic process models that capture linear dependencies between multivariate time series. Mostly used for economic forecasting, the method seems pretty robust and quite straightforward to implement in the R package ‘vars’. However,  finding out all of the steps/assumptions required to run the model was tricky so here is my adapted code to fill the gap:

rm(list = ls())

############import data#######################

data1 <- read.csv(“Data.csv”, head=T)

############detrend with regression#######################

m1 <- lm(model~0+Year, data=data1) #lm with no intercept
m1resid <- residuals(m1)


#make a datframe again

dataResid <- cbind(m1resid)

############Vector Autoregression#######################

#make a ts object – Freq here is how many obs per year.
ts.obj <- ts(dataResid,frequency=1, start = 1997, end = 2016); str(ts.obj)
#test for the most appropriate lag for your data (eg., does a 2 year time lag best predict the next years connectivity.

VARselect(ts.obj, lag.max=3, type=”const”)$selection

#  ‘p’ below is the the lag factor to test.

varLag1 <- VAR(ts.obj, p=1, type=”const”) #p is is the lag factor

#testing normality (has to be ‘insignificant’ at alpha 0.05 to trust the results)

serial.test(varLag1,, type=”PT.asymptotic”)

arch.test(varLag1) #test for heteroskedasticity. Error terms are fine if p>0.05

roots(varLag1) #have to be under 1 to trust model results.

#extensive list of summary results.


#links nicely to the forcast package to predict the future

fcstL1 <- forecast(varLag1)
plot(fcstL1, xlab=”Year”)

My animal ecology blog post is out now

It was a great privilege to be highly commended for the Journal of Animal Ecology Elton Prize for outstanding papers by early career researchers. It also gave me an opportunity to write a blog about said paper which you can find here:

PhyloPic – great resource for animal silhouettes.

Adding animal silhouettes to figures seems to be increasingly on trend in ecology. I have no empirical evidence to back up this claim, but it seems like every article in a high impact journal has at least one figure that incorporates silhouettes of species.  I too am guilty of adding them – I find them a useful visual tool, but in the past, I’ve had to create them myself using photoshop. No more! PhyloPic ( provides an easy to search collection reusable silhouette images of organisms from beetles to dinosaurs.

Resources like this are truly great!

Co-occurence modelling and parasites

It’s increasingly recognized that multiparasitism (being infected by multiple parasites at the same time) is commonplace and what particular set of parasites you are infected with can have direct implications for health (and are interesting in their own right). However, quantifying the complex interactions between co-occurring parasites is tricky. For example, are the co-occurrence of particular parasites just related to age i.e. as you get older you simply accrue more infection? Or are the parasites (via the immune system) facilitating (or prohibiting) the invasion of others or is it another reason entirely? Answering these questions is important but choosing the appropriate analytical solution is a little daunting. Species co-occurrence patterns have been studied of other organisms for a long time so there are many approaches.

So what are the options? Broadly, I recognize three distinct approaches: 1. Network-based models. 2. Probabilistic models and 3. Joint species distribution models. Each I will talk a little bit about and point out briefly some pros and cons about each approach. See the resources below for links to some of the methods/papers that use the method.

Network-based models.

Co-occurrence networks are networks of pathogens connected by edges (the connecting lines) which represent when those particular infections were sampled together. These methods look at the network structure by, for example, examining how connected certain pathogens are (i.e. degree) or by assessing which pathogens in the network cluster together more often than expected by chance (i.e. how modular the network is). Pros: relatively straightforward to analyze, a good way to view co-infection patterns (iGraph in R is great), not restricted to assessing just pairs of pathogens. Cons: difficult to overlay potentially confounding factors (e.g., age, but see the new and exciting MRFcov package from Nick Clark), hard to test for associations between pathogens across scales & difficult to incorporate trait or phylogenetic information.

Null and probabilistic models 

Basically, these methods ask do two species co-occur more or less often by chance. There is a large number of methods in this category and much debate to how robust these methods are (see Gotelli 2000), but the Veech 2013 method is my favorite as its distribution free. Pros: Easy to interpret, fast to run with low error rates. Cons: Can only assess pairs (exception: the screening approach of Vaumorin but you have to have < 10 pathogens) and can’t control for confounding effects or test for associations between pathogens across scales, null models can have extreme Type I errors (see Harris, 2016)  .

Joint distribution modeling

The last category and one I have used the most! Basically, this method quantifies the distribution of each parasite in your data to environmental (and host) variables using Bayesian hierarchical mixed modeling and then explores between-parasite relationships in the residual variation.  There are nice packages in R to help you apply this approach (BORAL and HMSC are my favorites). Pros:  Enables you to assess co-occurrence patterns after controlling for confounding factors and to assess these patterns easily across scales, they are flexible and can deal with parasite abundance data (i.e more than just presence/absence of a parasite) & you get useful niche models as a bonus. Also can easily incorporate parasite phylogenetic and functional trait data. Cons: an only assess pairs, &and it doesn’t provide coefficients for the strength of the co-occurrence patterns (just significantly different from zero).


Elise Vaumourin has a nice review article:

Network approaches– Modularity algorithm: Igraph:

Interesting paper:

The Nick Clark MRFCov approach:

Harris, 2016 for Markov networks:

Probabilistic models – The Veech paper: Vaumourin et al (2014)

Joint distribution modeling



Cool papers using the approach:


Exciting Animal Ecology issue

The new Journal of Animal Ecology special issue focuses on animal host-microbe interactions (often in a disease context) looks like a must read. All the articles look interesting but there a few which particularly stand out . Most I’ve seen in preprint form but it is nice to see them all together. In no particular order:

Mihaljevic et al on parasite metacommunities – this looks like an interesting technique!

Keiser et al on queen presence and disease – ants are always interesting.

Raulo et al on social behaviour and gut microbiota.

Becker et al on resource provisioning and host traits in detrmining host-parasite interactions.

Looking forward to reading these articles and the others in more detail!


Social systems and disease: Canine distemper in Yellowstone and the Serengeti

Recently I just got back from a really interesting meeting to link the disease work that Craig Packer, Meggan Craft and I have been doing as part of the Serengeti Lion Project to the Yellowstone Wolf Project. The meeting this time was in Yellowstone and it was a brilliant opportunity to see wolves in the wild (and the park in winter, see moose image below – my wolf photos weren’t great) . I now know much more about wolf biology and the effort taken to understand this charismatic species. Really a tremendous experience.

These two systems represent the most intensively studied social carnivore systems in the world and the opportunities to compare and contrast the disease ecology across systems is exciting. You wouldn’t necessarily think that the diseases infecting canids and felids would be similar, but in both systems one of the important pathogens is canine distemper virus (CDV). CDV can lead to serious reductions in numbers of both species, but perhaps due to the social organization,lions and wolves numerically recover quickly. CDV exposure , particularly for lions, is nasty and individuals can experience severe neurological symptoms.  It is unknown, however, if prides/packs impacted by CDV alter their interactions with other groups. This in turn could alter how diseases move around both landscapes by reducing inter group interactions for the years proceeding an outbreak. The hypothesis is that groups weakened by the disease are less likely to fight over territory and thus become more timid for a period post epeidemic. Our initial results show that this maybe the case for lions at least with the number of pride-pride contacts diminishing in the 4 year period after the epidemic. The lion population had largely recovered by then but the effects of CDV epidemics look like they linger. The plan is to now see if the same pattern can be found with the wolves and this could be the first compelling case for the power of epidemics to cause social disruption.

This collaboration was only really possible due to Pete Hudson, and I’m amazed that he was able to turn a conversation we had when he visited UMN last year into a collaboration that has the potential to understand sociality and disease in a new light.


Tip Dating in BEAST

Understanding how robust the molecular clock is a critical step for many evolutionary analyses. Usually when I get given a set of aligned sequences I turn to TempEst to test the ‘clocklikeness’ of the data. However, after the release of ‘TIPDATINGBEAST’ (I’ll call it TDBEAST fr short) I may turn to TempEst much less often. Whist TempEst is useful in getting some qualitative idea of the temporal signal in the data but I find it annoying that its hard to drill down to find what sequences sequences that are leading to bias.  Furthermore, TempEst  is sensitive to the input tree which can lead to problems and I have had issues with guessDates as well which is slightly irritating. TDBEAST solves lots of these issues and more and provides a robust method to test how ‘clock-like’ your sequences are. The TDBEAST R package does 2 things using BEAST log files and .xml files:

1. Uses a date randomization test to measure temporal signal and provides a nice visualization  to check results.

2. Uses a leave one out cross validation to work out the likely culprits that could be skewing results. These sequences, for example, could have the wrong date assigned to them by mistake – this is a real problem when working on sequences from the field.

Whilst I’m having a few teething issues with the code, overall the ‘how to’ guide is excellent and the code seems straight forward. TDBEAST definitely seems like a valuable addition to my phylogenetic tool box.

Here are the links:





Integral projection models and infectious disease

Population evolutionary ecologists are increasingly turning to integral projection models to understand how changes performance (e.g., growth) influence population dynamics, but this type of modeling is rarely applied to understand host-parasite feed-backs. After being introduced to this modeling approach by Tim Coulson and Shelly Lachish, I’ve been thinking about how they could be applied to disease ecology. I’m not the first one to do so and there is a great review by Metcalf et al (see link below) on the topic. The technique appeals to me it’s quantitative data-driven approach to understanding host-pathogen dynamics that can account for variation at a within-host, individual and population scale. Recently, Bayesian IPMs have been developed and these offer further advantages (see, but maybe more time consuming to construct.

It’s not surprising however, that these models haven’t really taken off in the field yet. One obvious reason for this could be due to high number of parameters necessary to run the model – although you can use this IPMs in a theoretical context also.  For most wildlife systems detailed individual longitudinal data on things such as parasite load over the term of infection is near impossible to get. I wonder if new molecular tools (e.g., measuring viral load using RT PCR from feces) may help fill this data gap? Currently, it looks like you need extensive lab experiments before you can really use this approach (see Wilber et al below). Anyway,  I’m looking forward to learning more the next time we meet with Tim and Shelly.

Mecalf et al (2015):

Wilber et al:


From Metcalf et al (2015).

Our eco-phylogenetic review is out now!

Curious about how phylogenetic community ecology can be applied to understand infectious disease? Our review is out now in Biological Reviews:

This all started thanks to a small grant from the University of Minnesota Institute on the Environment enabling me to get a great group of people together to talk disease ecology and phylogenetics. It’s now great to see it out there! It was also great working with a graphic designer to help get the figures that bit more appealing. I can highly recommend Elissa ( – I learned a lot from her about getting the visual aspects of figures more refined. Figure 1 is below….

Figure 1 review

Fig. 1. Conceptual schema illustrating how an eco-phylogenetic framework can be applied to understand infectious disease dynamics. The example system used is the Ngoronogoro Crater (Tanzania), across scales: (A) within host; (B) among hosts of the same species; (C) multi-host complex; and (D) landscape scale. Colour-coded and lettered symbols below each panel indicate what data (squares) and statistical tools (circles) could be used to address each challenge (see Section I.2 for model and other tool details). White ovals contain hypothetical parasite communities within a host and different parasite colours and shapes (nematodes or viruses) represent different parasite species or genotypes. PGLMM: phylogenetic generalised linear mixed model.