Great resource for implementing different substitution models in BEAST

Don’t you just hate when you run J model test (or similar software) to find the most parsimonious substitution model for a given set of sequences and the best model is something obscure and often not directly implementable in phylogenetics platforms like BEAST?

I stumbled across this excellent post by Justin Bagley that provides really useful information on how to put all sorts of substitution models in BEAST:

This is definitely a valuable resource and makes incorporating J model test results much easier.


Genomics of Wildlife Disease Workshop 2017

A couple of weeks ago I had the pleasure to attend the disease genomics boot camp (more formally the genomics of wildlife disease workshop) at Colorado State with a great bunch  of people including members of the Craft lab (see below).  It was the first time the workshops been held but overall it was a success and I can highly recommend it to others interested in the topic.

It really was a broad (and nearly overwhelming) overview of the entire next-gen process from getting sequences from Ilumina runs to a variety of downstream analytical approaches. There was also a a lot of material incorporating the host genome too which I thought was particularly useful. As our NSF project was responsible for the workshop I was an ‘auditor’ and assisted with the BEAST afternoon. Not only was the course material (and the lecturers) good but the guest speakers were excellent and help to frame things really well.  It was also a great opportunity to network with like-minded researchers and was a nice to chat with the other NSF postdocs/phD students all things puma (and bobcat) disease.

Now to get ready for EEID (Ecology and Evolution of Infectious Disease) 2017 in Santa Barbara…..


Misunderstanding the Microbiome: misuse of community ecology tools to understand microbial communities

Understanding how the vast collection of organisms within us (‘the ‘microbiome’) is linked to human (and ecosystem) health is one of the most exciting scientific topics today.  It really does have the possibility of improving our lives considerably though is often over-hyped (see the link below). However, I’ve recently I’ve been reading quite a few microbiome papers (it was our journal clubs topic of the month) and have been struck by the poor study design and lack of understanding of the statistical methodology. Talking to colleagues in the microbiome field – these problems maybe more widespread and could be hindering our progress in understanding this important component of the ecosystem within us.

Of course microbiome research is simply microbe community ecology, but the way some microbiome practitioners use and report community ecology statistics is problematic and sometimes outright deceptive.This includes people publishing in the highest scientific journals. I won’t pick on any particular paper, but here are a few general observations (sorry for the technical detail).

  1. Effect sizes are often not reported or visualized using ordination techniques. They have a significant P value but how do you know how biologically relevant this is?  My guess is that they are small as in often the case with free living communities.
  2. Little detail is given about how the particular test is performed. Usual example: “We did a PERMANOVA  to test for XX”. Despite the fact that the PERMANOVA has some general issues (see the Warton et al paper below), no information is given about the test anyway e.g., was it a two way crossed design, did they use Type III sums of squares etc? Did they test for multivariate disperson using PERMDISP or similar? Literally that is one of the only assumptions of the test but I haven’t read any microbiome paper that has checked. If they haven’t we can’t trust the results. Have they read the original paper by Marti Anderson? Some cite it at least….
  3. I haven’t found any PCA or PCoA plot with % of variance explained. This is annoying – the axes shown may only explain a small amount of variance in the community,  so thus the pretty little clusters of points shown maybe pretty artificial.

I’ll stop ranting. These issues really impair interpretation of the results and make the science difficult to replicate. It makes you ask “how do these papers get through the gates?’ I’m guessing that a significant proportion of authors, reviewers and editors have little experience in  community biostats and don’t really understand what the tests are doing. They are relying on analytical pipelines such as QUIIME that claim to ‘ publication quality graphics and statistics’ and not thinking much more about it. More microbiome researchers need to go beyond these pipelines and keep up-to-date with community methods more broadly. The quality of the research will clearly improve.


Microbiome over-hype:

Warton et al:

Marti Anderson’s paper:

‘Amusing’ reviewers comments

Having a thick skin and the ability to shrug off  harsh and sometimes personal criticism is an often unrecognized trait of a scientist. You put your work out there to the world and get feedback from often anonymous peers(but this is changing slowly) . The system works usually pretty well and 99% of the time makes the paper better.  When the comments are  highly critical, you go through a mini five stages of grief but  you always come around and the paper gets better. I’ve definitely had my fair share of critical feedback, but one of my recent favorites was  a reviewer suggesting that my literature review “hadn’t gone beyond the literature”….(?) However, none have come close to the comments that this author received:

there are so many good lines but this one is the best: “This paper has merit and no errors, but I do not like it …”

Pleasing that it still got published in the journal anyway!

Interesting May edition of Animal Ecology

The May edition from the Journal of Animal Ecology is pretty much essential reading for anyone interested in disease ecology (particularly those using network approaches). Springer et al’s paper about dynamic networks and Cryptosporidium spread is particularly interesting  – I really like the fact that they incorporated different transmission modes into their dynamic network model  –  this reflects the reality in lots of host-parasite systems. I also like that they used both empirically derived networks and simulated models. The comparison between static and dynamic models wasn’t particularly exciting  – it seemed obvious that dynamic models were always going to lead to bigger outbreaks. Nonetheless really interesting work.

The study by Patterson et al on tuberculosis  and meerkats was also really cool – combining both social and environmental predictors to understand tb risk in the Kalahari was interesting and is something I’m trying to with the Serengeti lions. They should have used machine learning though!

Furthermore the community ecology section is full of interesting papers as well – hopefully I’ll get around to reading them soon.

Excellent series of blog articles about data science

I just found this excellent series of articles by John Mount.  Really intuitive and the explanations he gives are good. Furthermore there is really useful R code to recreate the figures they make.   Really a must read if you are getting into data science.

Demystifying the BEAST (Part I)

If you are like me when you first opened up BEAST phylogenetics software, or more specifically BEAUTi GUI,  you are immediately impressed but a little overwhelmed by the number of options you have to do reconstruction and phylogenetic analysis more broadly. It’s an amazingly powerful and accessible  free software package –  particularly with the extra utilities such as Fig Tree and SPREAD (will talk about these in future posts) you really can’t beat BEAST. The tutorials (see below) provide step by step instructions basically mean that anyone can run BEAST. Furthermore, If you have any technical problems,  they are often quickly resolved via the google group (!forum/beast-users).

Despite this, understanding the numerous decisions that have to made throughout the process from sequence alignment to final product is the real challenge here. These decisions can make real differences to the inferences you make so are critical to get right. One weakness in the tutorials is that they tell you how, but don’t give enough detail to why you’d make a particular decision (e.g., why one tree prior over another?). The aims of  the following posts will be to demystify this process a little  and direct you to useful resources.

The plan is to do this tab by tab of BEAUti and I will assume that you know how to import your data in and set dates/traits (all  of this can be learnt from the tutorials easily).


Part 2 –  Sites

Part 3 – Clocks

Part 4 – Trees

Part 5 -States, Priors and Operators

Part 6  – Running the whole thing and model selection

In the mean time, if you haven’t already, download BEAST 1.8.4 ( and go through the tutorials:



NSF vs ARC: A Postdocs Perspective on American and Australian Research Funding

My first NSF DEB pre-proposal submitted (or any ‘big’ grant for that matter) … hooray! It’s nice to regain head-space to think about something else for a while at least. Even as a co-PI on a pre-proposal, the process was a tad stressful.  To tell you the truth though, I actually enjoyed the process.Maybe because the thinking was in the future tense rather than past (i.e. I was thinking about future research rather than analyzing and writing about great data of the past)? Partly perhaps, but I enjoyed the fact that  Meggan and I worked together nicely and with people across the world to create a 5 page document that sold, what we think at least, is a cool an novel idea. I read it and want to actually do it – I hope reviewers/the panel agree!

If you think about it logically though,  the process looks absurd in that we put so much time and effort into something with an 8% chance of success is clearly insane (see the NSF blog below for trends). And this success rate is pre-Trump! I thought things were bad in Australia, but this actually makes the Australian Research Council  (ARC) equivalent grants (Discovery or DECRA) seem like a ‘good’ bet with success rates of around ~17% (see below). I wonder where the cutoff is? I wonder at what % success will researchers even bother submitting anything? Or is even 1% success worth the effort considering the reward? This situation is clearly stressful for faculty, but for postdocs like myself who rely on this type of funding to ‘make a name’ and to get a gig (read tenure track position or another postdoc) it’s nearly too much. Nonetheless, I somehow push it to the back of my brain and continue to do what I enjoy doing (and hope is of some use to society). Should we move to NZ, Canada, Europe or Asia? Any perspective on these countries/continents would be great.

Even if we don’t get funded which is highly likely, we can no doubt use these ideas in other grants. Fingers crossed of course! Nonetheless, it has been an excellent learning experience and I’ve had fun helping craft the pre-proposal. There are excellent resources out there that have helped enormously and I feel are valuable for grant writing in general (the NSF DEB blog and Mike Kaspari’s blog below for example).  Hopefully, one day things will get better and less of the collective grant writing effort will be wasted.

DEB Numbers: FY 2016 Wrap-Up

On writing a strong NSF pre-proposal