Efficient Bayesian inference of phylogenetic trees from large scale, low-depth genome-wide single-cell data

biorxiv(2020)

引用 16|浏览38
暂无评分
摘要
A new generation of scalable single cell whole genome sequencing (scWGS) methods [, ], allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cells populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing mutational processes. The ability to sequence tens of thousands of single genomes at high resolution per experiment [] is challenging the assumptions and scalability of existing phylogenetic tree building methods and calls for tailored phylogenetic models and scalable inference algorithms. We propose a phylogenetic model and associated Bayesian inference procedure which exploits the specifics of scWGS data. A first highlight of our approach is a novel phylogenetic encoding of copy-number data providing an attractive statistical-computational trade-off by simplifying the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. A second highlight is an innovative phylogenetic tree exploration move which makes the cost of MCMC iterations bounded by (|| +||), where || is the number of cells and || is the number of loci. In contrast, existing off-the-shelf likelihood-based methods incur iteration cost of (|| ||). Moreover, the novel move considers an exponential number of neighbouring trees whereas off-the-shelf moves consider a polynomial size set of neighbours. The third highlight is a novel mutation calling method that incorporates the copy-number data and the underlying phylogenetic tree to overcome the missing data issue. This framework allows us to realistically consider routine Bayesian phylogenetic inference at the scale of scWGS data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要