Demystifying the BEAST (Part I)

If you are like me when you first opened up BEAST phylogenetics software, or more specifically BEAUTi GUI,  you are immediately impressed but a little overwhelmed by the number of options you have to do reconstruction and phylogenetic analysis more broadly. It’s an amazingly powerful and accessible  free software package –  particularly with the extra utilities such as Fig Tree and SPREAD (will talk about these in future posts) you really can’t beat BEAST. The tutorials (see below) provide step by step instructions basically mean that anyone can run BEAST. Furthermore, If you have any technical problems,  they are often quickly resolved via the google group (!forum/beast-users).

Despite this, understanding the numerous decisions that have to made throughout the process from sequence alignment to final product is the real challenge here. These decisions can make real differences to the inferences you make so are critical to get right. One weakness in the tutorials is that they tell you how, but don’t give enough detail to why you’d make a particular decision (e.g., why one tree prior over another?). The aims of  the following posts will be to demystify this process a little  and direct you to useful resources.

The plan is to do this tab by tab of BEAUti and I will assume that you know how to import your data in and set dates/traits (all  of this can be learnt from the tutorials easily).


Part 2 –  Sites

Part 3 – Clocks

Part 4 – Trees

Part 5 -States, Priors and Operators

Part 6  – Running the whole thing and model selection

In the mean time, if you haven’t already, download BEAST 1.8.4 ( and go through the tutorials:



How many types of statistical analysis approaches do you use regularly

Whilst deciphering  really cool R package called GDM (see below), I was thinking about how many different statistical approaches and techniques have have I read about, deciphered and applied in the last 2 years?  What’s a usual number of techniques people use reasonably regularly? My list is at approximately 30 currently –  but I am a postdoc that spends basically all of my time analyzing data from diverse range of systems with an equally wide variety of data types, so  perhaps that’s normal?

The first place I started looking was in my R package list and I quickly realized that there were quite a few. I excluded ‘bread and butter’ GLM type analyses and there Bayesian equivalents e.g., ANOVA & GLMMs and basic ordination techniques (e.g., PCoA, NMDS). I  haven’t also included techniques to calculate the various aspects of diversity or sequence alignment algorithms either as the list would just keep going. As I deal with species distribution data,  distances and (dis)similarities quite often there was obviously a trend towards distance-based techniques (see below), with a mixture of spatial, epi and phylogenetic approaches.

I’m too lazy to add citations and descriptions for each one – but they are all easy to find in google or email me if you are interested. If there is anything else that I should know and use to answer disease/phylogenetic community ecology type questions, please make suggestions.

In no particular order:

Permutation-based ANOVA (PERMANOVA), permutation based tests for homogeneity of dispersion PERMDISP, canonical analysis of principal coordinates (CAP) analyses , dbMEMs, Generalized dissimilarity modelling (GDM), distance-based linear modelling (distLM), multiple matrix regression (MRM), network-based linear models (netLM), Gradient Forests, Random Forests, cluster analysis, SYNCSA analysis,  fourth corner analysis, RLQ tests, Mantel tests, Moran’s I tests (phylogenetic and spatial), Phylogenetic GLMMs, everything in the R package Picante, ecophylogenetic regression (Pez), dynamic assembly model of colonization, local extinction and speciation (DAMOCLES), dynamical assembly of islands by speciation, immigration and extinction (DAISIE), all sorts of ancestral state reconstruction approaches, numerous Bayesian evolutionary analysis sampling trees (BEAST) methods, numerous phytools methods, environmental raster and phylogenetically informed movement (SERAPHIM),SaTScan, Circuitscape, point-time pattern analysis, Kriging, epitools risk analysis.

Link to GDM:


Coupling BEAST with eco-phylogenetics: Tales from Glascow

Can community level phylogenetics be used effectively with population genetic approaches to better understand infectious disease dynamics? This was one of the questions that came up on my recent trip to the University of Glascow. It was great fun hanging out with the fine folk from the IBAHCM (Institute of Biodiversity Animal Health and Comparative Medicine) with numerous discussions about life, the world and all sorts of disease ecology topics. The purpose of the trip was a research exchange with Roman Biek and is lab to become more familiar with BEAST and associated phylodynamic tools. Naturally it got me thinking about how to synthesize these tools with community phylogenetics – particularly in understanding transmission dynamics. Basically BEAST provides excellent spatial/temporal estimations of disease spread, but is not as good at linking phylogenetic information to multiple interacting landscape and host variables. They are my conclusions for now anyway – BEAST can do GLMs apparently but I’ve heard the interpretation can be difficult.Stay tuned for  my review on the topic which is nearly ready to submit somewhere.

On a more applied note – if you are a BEAST user or interested in becoming one here is a link to a useful set of tutorials: Also I can highly recommend the R package ‘Seraphim’ for post-BEAST spatial analysis of pathogen dynamics: – though installing in R is a little tricky (this will be a topic of a future blog post).