Graphical network models and moose microbes

Recently, I wrote an article now out on the Journal of Animal Ecology blog on the use of graphical network models to pick apart the microbiome.  In essence, graphical models are a merger between probability theory and graph theory and are a valuable addition to community ecologist’s/microbial ecologists tool belt. In the article, we introduce these methods and give a behind the scenes take on our moose microbiome paper that utilises a spatially explicit graphical network model that Nick Clark developed.

Here is the link: https://animalecologyinfocus.com/2020/08/10/untangling-community-dynamics-using-spatially-explicit-bayesian-networks/

Nick Fountain-Jones

An introduction to phylodynamics and COVID-19

Over the last month or so I’ve been working with a very talented artist (Jai Sutton-Bassett) creating a comic that briefly explains phylodynamics and how it helps in the fight against COVID-19. A really great experience and I hope it shows the importance of sequencing and understanding the evolution of pathogens.Epidimology comic final2

The links between COVID-19 epidemiology and evolution

Understanding how a virus evolves and mutates is not only crucial for the development of effective vaccines and treatments but also offer important insights into a pandemic. For example, geographic patterns of virus spread, periods of epidemic growth and control efforts imprint themselves on the viruses genetic code. Advances in statistics have given us the ability to unravel these complex patterns.  Since SARS-CoV-2 (a.k.a COVID-19) spilt over from wildlife to humans sometime in late 2019, it has spread across the world and the genome of the virus has accrued mutations and evolved. This is not surprising –  this is what viruses such as coronavirus do. As RNA viruses such as SARS-CoV-2 replicate inside a host, errors occur leading to mutations in the new viruses that then can be transmitted on. Some mutations can give the virus a selective edge (i.e. allow the virus to be more easily transmitted) but most will be of no consequence or a hindrance to the virus and will be rapidly lost. Interestingly, when there are lots of infections it is likely that there will also be lots of mutations occurring. Therefore, studying the patterns of mutations can allow us to estimate how much virus was circulating out there at a given time (using some clever math). Mutations also generate new ‘strains’ or lineages of the virus and one lineage may have advantages over the others (we will use lineage as it is a less contentious term). If there are distinctive lineages across the world with different epidemiological characteristics is the topic of much debate. For example, is one lineage expanding while another is decreasing? Are there any distinctive lineages at all? It had only been ~6 months since  SARS-CoV-2 emerged (at time of writing)…

Fig. 1new
Two trees (calculated with different methods) showing the evolutionary relationships of the virus. Branches of the tree are coloured by the lineage they belong to. Both of the methods we used identify three lineages (A, B & C). MRCA: Most recent common ancestor i.e. when did the virus emerge? Based on our analysis of genetic data it is likely that the virus emerged November/December 2019. The bar plots on the right-hand side show the proportion of sequences sampled on each continent for each lineage.

This is where our study fits in. In our paper, we explore this idea of different lineages by applying some recently developed statistical approaches to around 779 SARS-CoV-2 genomes that we downloaded from GISAID in late March (an open access data repository for virus sequence data). We found that it was likely that three different lineages are being transmitted around the world. Two lineages (‘A’ and ‘B’) likely emerged from China in November/December 2019 with another more recent lineage (‘C’) diverging in February 2020. Lineage ‘A’ is the likely lineage that made the switch from bats into humans (via another species) – thus it is likely to be the ancestral lineage. We found a roller coaster-like pattern of growth and decline in genetic diversity of each lineage. This rollercoaster closely followed China’s experience with the virus followed by the expansion of COVID-19 across the world. Initially, Lineage A experienced lots of growth in genetic diversity as the epidemic intensified in Wuhan followed by a decline as the virus was controlled in the region. We have some evidence that Lineage B started to spread at the height of outbreak in China and then went into a decline when the virus was controlled. Both lineages then entered, based on our estimates, a period of growth late February into early March when the outbreak turned into a pandemic and the virus spread into Europe in particular. At this point, Lineage C came onto the scene having picked up some important mutations along the way. This lineage likely originated in Europe and, subsequently to our study, has become the most common lineage around. The mutations that make Lineage C distinctive may have made the virus more transmissible (more easily transmitted from person to person) but more research is required to support this hypothesis. There is no evidence that infection by one lineage may make you sicker than an infection by another.

This study not only shows some interesting insights into the pandemic, it also demonstrates that our approach is sensitive enough to pick up when, for example, control measures are (or aren’t) working. These estimates are particularly useful as they aren’t sensitive to asymptomatic cases that rarely get recorded or in countries where widespread testing is out of reach. Furthermore, if we go through another period of growth in genetic diversity we may find yet another lineage diverging. This will be ever more important in the months to come.  Before we started this study we weren’t sure if there was enough diversity to pick up these patterns at all. Since downloading these sequences in March another ~29 000 SARS-CoV-2 genome sequences have been added (truly a wondrous human achievement!), so there is much more genetic sleuthing to do.

Fig. 2 Global Growth
Growth in genetic diversity (a.k.a effective population size of the virus) through time.

Submissions are open: Novel methods and concepts in disease ecology special issue

I and my fellow special issue editors are really excited that submissions are open for our research topic on novel methods and concepts in disease ecology in Frontiers in Ecology and Evolution. Understanding disease dynamics has never been so important, and we are after papers than cross disciplines to provide new insights into disease emergence, transmission, spread and endemization. Submit abstracts by the 25th of May with the manuscript deadline being the 25th of September.

AbstractsOpenImage

Find the full details here: https://www.frontiersin.org/research-topics/13402/disease-ecology-novel-concepts-and-methods-to-track-and-forecast-disease-emergence-transmission-spre

Making the most out of machine learning models

Statistical machine learning is now becoming much more commonplace in ecology and evolution disciplines. Qualitatively at least, I’d say 15% of the posters/talks I’ve seen at ESA/Evolution conferences recently have featured some form of machine learning approach.  This is not surprising given the power of these methods to form robust predictive models from increasingly common big complex datasets. However, there is still lots of confusion about machine learning approaches are and how they can be applied and interpreted. This is particularly the case in the sub-discipline I’m most familiar with; disease ecology. I feel the confusion mostly stems from lack of experience with the methods, but also from the assumption that these methods are just black boxes for big data and don’t have a probabilistic basis. These assumptions are demonstrably false, and recent advances in computer science have revolutionized the power and interpretability of these models. In particular, these methods have the power to allow us to find new insights and tackle new questions into complex and messy disease ecology datasets.

This is what encouraged our team to create and test a machine learning pipeline that incorporates the latest advances in computer science to understand disease risk in populations. The associated paper just came out in the Journal of Animal Ecology here: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.13076 which I’m very excited about! This pipeline is really flexible and could be applied to literally any other classification or regression problem the same way. In essence, this pipeline brings together code from the R packages caret and iml as well as some preprocessing packages such as MissForests together in a user-friendly pipeline.  Where we identified gaps we put in our own functions to make the process as smooth as possible.

We tested the pipeline out examining disease risk in the Serengeti lions and find that our models cannot only provide powerful predictive models, but also unique insights into the mechanistic drivers of disease in this system.

image

This is just the tip of the iceberg though on how this pipeline could be used and is also, given the state of the machine learning world, likely to be outdated soon.  But hopefully, at least, it provides a basis for ecologists to build and compare their own robust machine learning models and interpret them in new powerful ways

Links to packages:

caret (https://topepo.github.io/caret/index.html)

iml  (https://christophm.github.io/interpretable-ml-book/intro.html)

MissForests (https://stat.ethz.ch/education/semesters/ss2012/ams/paper/missForest_1.2.pdf)

The official Molecular Ecology Blog is live!

My apologies for a lack of blog articles in the last couple of months. I’ve been busy with the social media team and editorial board getting an official blog for Molecular Ecology and Molecular Ecology Resources off the ground. The aims of the blog are to highlight some of the fantastic papers published by both journals and provide ‘behind the paper’ insights as well as useful updates from the journals too.

It has been a monster effort by lots of great people, and we are really excited to get this out there. Here is the link: http://www.molecularecologyblog.com

Logo2

Endemic infection can shape epidemic exposure: using breakthroughs in statistical ecology to better understand co-infection patterns

Throughout our lives, we are exposed and infected by a diverse community of pathogens from viruses and bacteria to parasitic worms. In humans, what combination of pathogens you are infected by matters as these organisms can interact with each other in remarkable ways that can alter the outcome of an infection. For example, people co-infected by HIV (human immunodeficiency virus) and tuberculosis (tb – a disease caused by Mycobacterium bacteria) experience heightened symptoms of each pathogen and are a much higher risk of dying compared to people infected by just one of these pathogens. HIV interferes with the immune system that not only allows tb to grow faster but also increases the chances of that individual transmitting the bacteria. This is an example of a positive or ‘facilitative interaction’ between pathogens in ecological speak. In contrast, pathogens can compete as well (a negative interaction) and is some cases this can protect us from disease. For example, co-infection between certain parasitic worms can actually be protective of malaria (see Nacher, 2011 below). Further, we know it is possible that interactions between pathogens can be dependent on the order of infection  (see Hoverman et al. for more on this). But how do we test for these specific interactions, particularly in wildlife? Humans and wildlife are exposed and infected by a diverse range of organisms; how could we work out which ones to test? It is unfeasible to test every combination in the lab and even then, how would we know what combination actually occurs in the wild?

In this paper, we harnessed recent advances in ecological statistics and network theory to quantify associations between pathogens in a wild population of lions in the Serengeti in Tanzania. We label them associations as we can’t be 100% sure that they actually represent real interactions between pathogens (you’d need to do lab experiments for that which are difficult to do for wildlife). Based on over 10 years of exposure and infection data from a wide variety of pathogens that infect lions, we were able to establish which pathogens were positively or negatively associated with others. As we have been monitoring these lions often since birth, we were able to deduce the likely order of infection or exposure and work out if a pathogen that a lion was exposed to early in life could impact which pathogen they were exposed to as adults. These statistical methods are also useful as they can start to untangle if these associations could be just due to environmental factors (i.e. the lion got co-infected by two pathogens because of an ecological preference of these pathogens) rather than a potential biological mechanism.

The associations we found using these methods were often surprising but reflected what has been established in human lab-based studies which is promising. For example, we found a strong negative association between Rift Valley Fever (RVF -a mosquito-borne virus that infects lion as well as cattle and sheep leading to sometimes devastating economic loss) and felid equivalent to HIV (FIV). FIV infects nearly 100% of lions as cubs, whereas RVF infection is more likely to occur later in life. Interestingly RVF has similar molecular machinery to a group of viruses that are known to inhibit the growth of HIV, so it is possible that the same mechanism exists for lions as well. Similarly, we found a strong negative association between feline coronavirus (in the virus family that causes severe acute respiratory syndrome or SARS in humans) and one type of FIV also. Coronaviruses are considered possible candidate vaccines for HIV, so again laboratory work from human medicine provided some support for our findings.

We didn’t just find negative associations either, we also detected strong positive interaction between the tick-borne Babesia protozoans and canine distemper virus (CDV). This co-infection pattern has been identified previously and is likely the underlying factor that caused this lion population to crash by over 33% in the 1990s. Lions are may be able to withstand a CDV epidemic in isolation but when combined with Babesia in a co-infection, this can lead to serious population declines for this species (see Munson et al for some more details).  Our study shows that it didn’t matter which species of Babesia either, all of the species we included had these strong positive associations with CDV.

We can’t prove conclusively that these pathogens actually interact within a lion based on these statistical methods alone. However, we can provide a valuable ‘shortlist’ of possible interactions that occur in a wild population that can be tested using cell-level experiments in a lab – we obviously don’t want to actually test these hypotheses out on lions themselves. Given how common interactions between pathogens are and the potentially positive or negative outcomes of them for the host, our approach coupled with lab-work can provide important insights to understanding pathogen dynamics in wild populations.

Nacher (20111): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3192711/
Hoverman et al (2013): https://www.ncbi.nlm.nih.gov/pubmed/23754306?dopt=Abstract

Munson et al (2008) : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002545

A link to the paper here: https://onlinelibrary.wiley.com/doi/full/10.1111/ele.13250

Results are in!

If you were wondering what the results were from our survey asking about journal solicitations from preprint servers the link is below.

Basically, even though our sample was biased to people reading blogs and/or Twiiter, it seems like there is reasonable support for journals to solicit papers from preprint servers. This was particularly true for early career folks unsurprisingly….

https://www.molecularecologist.com/2019/01/survey-results-journal-solicitations-from-preprint-servers/

Molecular tools and community ecology: Great special issue in Molecular Ecology

How can we use molecular tools to better understand community dynamics? This is but one of the questions that the recent special issue in Molecular Ecology delves into. This issue focuses on ecological networks where species are the ‘nodes’ and edges represent interactions between species. What I particularly like about this collection of papers is the breadth of taxa from aquatic to terrestrial as well as the breadth of interactions captured from predator-prey to host-symbiont. Most of these communities are hard to observe in nature (i.e. the organisms are small or nocturnal) so thus molecular tools are really the only option.

Lots to learn from this interesting set of papers!

Here is the link: https://onlinelibrary.wiley.com/toc/1365294x/2019/28/2

Science and social media

Recently I was introduced to the world of making scientific content for social media in a fun workshop.  As part of this workshop, I was introduced to the world of Lumen 5 https://lumen5.com/dashboard/. I’ve always tried to communicate my research to the public through social media and Lumen 5 makes doing this really achievable. This website enables you to quickly generate a high-quality video ideal for sharing on Facebook etc. For social media videos, I didn’t realize the significance of using text over the video to allow the public to read your story and see the images. Often your video gets viewed on public transport (or other places where sound is a no-no) so having a video that can communicate without sound is important. The media library used to help construct these videos is free of any copyright issues which is nice.

See my first attempt here: https://www.youtube.com/watch?v=KbxOaHoMdjE&feature=youtu.be