Genome compression using normalized maximum likelihood models for constrained Markov sources

Porto(2008)

引用 18|浏览1
暂无评分
摘要
The paper presents exact and implementable solutions to the problem of universal coding of approximate repeats by using the normalized maximum likelihood model for the class of Markov sources of first order, incorporating constraints which are standard in the context of fast searching similarities over full genomes. A coding scheme combining universal codes for memoryless sources and for sources with memory is then presented. The results when compressing the full human genome show that the combined scheme is able to provide slight improvements over the existing state of the art. As a side result, interesting pairs of sequences may be found, which are highly similar by the new NML model for Markov sources, but have a lower similarity score when evaluated with the NML for memoryless sources.
更多
查看译文
关键词
markov processes,genetic engineering,genetics,maximum likelihood estimation,markov sources,coding scheme,constrained markov sources,genome compression,memoryless sources,normalized maximum likelihood models,universal coding,human genome,first order
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要