Splice site prediction across different organisms - A transfer learning approach.

SETN(2020)

引用 0|浏览13
暂无评分
摘要
The fast emergence and the great success of Next Generation Sequencing (NGS) technology has marked a new era in genetics, giving the scientific community the ability to sequence complete genomes in just a few hours. However, it has also brought challenges in ab initio genome annotation, considering the disproportion of the huge amount of data generated and the already annotated data, rendering new computational methods, that can facilitate the annotation process crucial. An essential task in gene annotation is splice site definition, that is the identification of the boundaries (splice donor and acceptor) between the exons and the introns of a gene. Herein, we face the problem of acceptor splice site recognition using transfer learning methods. More specifically, we use transfer learning in order to benefit from the knowledge obtained from the annotation of Caenorhabditis Elegans, which is a model organism, and apply it to the identification of acceptor sequences in four other organisms. For this purpose, we have performed an extended pre-analysis of the acceptor sequences of C. Elegans, aiming at the identification of the most significant positional sequence motifs in order to represent the sequences based on them. True and decoy C. Elegans acceptor sequences are then used for the training of a modified k-Means algorithm. The proposed approach appears to be very promising, with auPRC values between 0.76 -for the most evolutionary distant from C. Elegans organism- and 0.99 -for the closest organism.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要