Batch correction evaluation framework using a-priori gene-gene associations: applied to the GTEx dataset

BMC Bioinformatics(2019)

引用 21|浏览15
暂无评分
摘要
Background Correcting a heterogeneous dataset that presents artefacts from several confounders is often an essential bioinformatics task. Attempting to remove these batch effects will result in some biologically meaningful signals being lost. Thus, a central challenge is assessing if the removal of unwanted technical variation harms the biological signal that is of interest to the researcher. Results We describe a novel framework, B-CeF, to evaluate the effectiveness of batch correction methods and their tendency toward over or under correction. The approach is based on comparing co-expression of adjusted gene-gene pairs to a-priori knowledge of highly confident gene-gene associations based on thousands of unrelated experiments derived from an external reference. Our framework includes three steps: (1) data adjustment with the desired methods (2) calculating gene-gene co-expression measurements for adjusted datasets (3) evaluating the performance of the co-expression measurements against a gold standard. Using the framework, we evaluated five batch correction methods applied to RNA-seq data of six representative tissue datasets derived from the GTEx project. Conclusions Our framework enables the evaluation of batch correction methods to better preserve the original biological signal. We show that using a multiple linear regression model to correct for known confounders outperforms factor analysis-based methods that estimate hidden confounders. The code is publicly available as an R package.
更多
查看译文
关键词
Batch correction, Batch effect, Gene expression, ComBat, Principal component analysis, GTEx
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要