Genome Compression: An Image-Based Approach

ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2018), PT II(2018)

引用 0|浏览19
暂无评分
摘要
With the advent of Next Generation Sequencing Technologies, it has been possible to reduce the cost and time of genome sequencing. Thus, there was a significant increase in demand for genomes that were assembled daily. This demand requires more efficient techniques for storing and transmitting genomic data. In this research, we discussed the horizontal compression of lossless genomic sequences, using two image formats, WEBP, and FLIF. For this, the genomic sequence is transformed into a matrix of colored pixels, where an RGB color is assigned to each symbol of the A, T, C, G alphabet at a position x-y. The WEBP format showed the best data-rate saving (76.15%, SD= 0.84) when compared to FLIF. In addition, we compared the data-rate savings of two specialized DELIMINATE and MPCompress genomic data compression tools with WEBP. The results obtained show that the WEBP is close to DELIMINATE (76.03%, SD = 2.54%) and MFCompress (76.97%). SD = 1.36%). Finally, we suggest using WEBP for genomic data compression.
更多
查看译文
关键词
Data compression, Genome compression, Assembled genomic sequence, Lossless compression, Image file format
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要