Transcriptome diversity is a systematic source of bias in RNA-sequencing data

biorxiv(2021)

引用 0|浏览1
暂无评分
摘要
Background RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to detect and remove artifactual signals. Several factors such as sex, age, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER) has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Results Here we show that transcriptome diversity – a simple metric based on Shannon entropy – explains a large portion of variability in gene expression, and is a major factor detected by PEER. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. This prevalent confounding factor provides a simple explanation for a major source of systematic biases in gene expression estimates. Conclusions Our results show that transcriptome diversity is a metric that captures a systematic bias in RNA-seq and is the strongest known factor encoded in PEER covariates. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
diversity,bias,rna-sequencing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要