A statistical reference-free algorithm subsumes and generalizes common genomic sequence analysis and uncovers novel biological regulation.

bioRxiv : the preprint server for biology(2023)

引用 3|浏览21
暂无评分
摘要
We show that myriad, disparate mechanisms that diversify genomes and transcriptomes can be captured by a unifying principle: sample-dependent sequence variation. This variation occurs in both RNA and DNA and functions to regulate transcript expression and adaptation. Using this insight, we develop a novel highly efficient algorithm - NOMAD - that performs inference on raw reads without any genomic reference or sample metadata. NOMAD unifies data-scientifically driven discovery with previously unattainable speed and generality. Examples include SARS-CoV-2, humans, and non-model animals and plants with both bulk and single cell RNA-sequencing data. A snapshot of its novel discoveries include missing variants in SARS-CoV-2, gene regulation in diatoms epiphytic to eelgrass, an oceanic plant critical to the carbon cycle and significantly impacted by climate change, and in octopus where it identifies isoform regulation in genes missing from the reference. NOMAD is a new unifying approach to sequence analysis that enables expansive discovery.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要