An introduction to MPEG-G, the new ISO standard for genomic information representation

bioRxiv(2018)

引用 11|浏览56
暂无评分
摘要
The MPEG-G standardization project is the largest coordinated international effort to specify a compressed data format that enables large-scale genomic data processing, transport, and sharing. It is the first ISO/IEC standard that addresses the problems and limitations of current genomic data formats towards a truly efficient and economical handling of genomic information. It provides the means to implement leading-edge compression technology achieving more than 10x improvement over the BAM format. The standard also provides a set of currently-needed functionalities, such as selective access, application programming interfaces to the compressed data, support of data protection mechanisms, and support for streaming applications. Furthermore, ISO/IEC is also engaged in supporting the maintenance of the standard to guarantee the perenniality of applications using MPEG-G. Finally, interoperability and integration with existing genomic information processing pipelines is enabled by supporting conversion from/to the FASTQ/SAM/BAM file formats. In this paper we review the MPEG-G standard in more detail, as well as the main advantages and functionalities offered by it.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要