Model-Based Biclustering For Overdispersed Count Data With Application In Microbial Ecology

METHODS IN ECOLOGY AND EVOLUTION(2021)

引用 4|浏览19
暂无评分
摘要
Different studies have shown that microbial communities living in animals (humans included), in or around plants have a significant impact on health and disease of their host and on various services, such as adaptation under stressing environment. The basic input data to study microbiomes is a matrix representing abundance data of micro-organisms across different sampling units. Such a matrix typically corresponds to taxonomic profiles derived from the high-throughput sequencing of environmental samples. Biclustering is one way to study the interactions between the structure of micro-organism communities and the environmental samples they come from.We propose a latent block model (LBM) and an associated inference procedure for the biclustering of rows and columns of abundance matrices. The LBM assumes that micro-organisms (rows) and environmental samples (columns) can both be clustered into groups characterizing preferential interaction or avoidance. We use the Poisson-Gamma distribution to model the overdispersion observed in microbial abundance data and introduce row and column effects to account for the sequencing effort in each sample and the mean abundance of each micro-organism. Because the latent variables are not independent conditionally on the observed ones, classical maximum likelihood inference is intractable. We then derive a variational-based inference algorithm and propose a strategy to select the number of biclusters.We illustrate the flexibility and performance of our approach both on a simulation study and on three ecological datasets. The model-based framework allows us to adapt to peculiarities of microbial ecological abundance data and allows us to explore relationships between entities of two different natures.We implemented our method in the cobiclust R package available on the CRAN and built a website with example of usage ().
更多
查看译文
关键词
count data, latent block model, metabarcoding, microbial interactions, model&#8208, based biclustering, Poisson&#8211, Gamma distribution, variational EM algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要