Markov Chain Monte Carlo for phylogenetics: a helicopter ride
Luiz Max Carvalho (FGV EMAp)
Estimating phylogenetic trees is now a standard tool in fields as diverse as Medicine, Anthropology, Molecular Biology and Epidemiology.
Under a Bayesian approach, the central statistical task is to produce a distribution over the space of trees that is compatible with observed data.
Since even the simplest toy problems in this area are intractable, simulation-based numerical methods are the go-to solution in order to produce estimates (expectations).
Performance is hindered, however, by the fact that treespace is extremely high-dimesional and irregular, with target distributions often being multi-modal. The problem also lacks differential structure, which complicates the use of gradient-based methods such as MALA or HMC.
In this talk I shall give a bird's-eye view of the main problems when doing MCMC in treespace, from computation to diagnostics.
I will discuss the development of adaptive candidate-generating mechanisms for Metropolis-Hastings-type algorithms, validation strategies to ensure correctness and present ideas on how to measure statistical efficiency.