ClusTrast: a short readde novotranscript isoform assembler guided by clustered contigs

Karl Johan Westrin, Warren W. Kretzschmar,Olof Emanuelsson

bioRxiv (Cold Spring Harbor Laboratory)(2022)

引用 0|浏览0
暂无评分
摘要
AbstractBackgroundTranscriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performedde novo, but studies have shown thatde novomethods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms.ResultsWe present thede novotranscript isoform assembler ClusTrast, which takes short read RNA-seq data as input, constructs a primary assembly, clusters a set of guiding contigs, aligns the short reads to the guiding contigs, assembles each clustered set of short reads individually, and merges the primary and clusterwise assemblies into the final assembly. We tested ClusTrast on real datasets from six eukaryotic species, and showed that Clus-Trast reconstructed more expressed known isoforms than any of the other testedde novoassemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35–69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58–81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs. ClusTrast recall increased when using a union of assembled transcripts from more than one assembly tool as primary assembly.ConclusionWe suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants.
更多
查看译文
关键词
novo</i>transcript isoform assembler
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要