Compression with unified and accessible byte blocks to enhance management and analyses of UKBB-scale genotypes

Research Square (Research Square)(2021)

引用 0|浏览0
暂无评分
摘要
Whole-genome sequencing projects of millions of persons contain enormous genotypes, entailing a huge memory burden and time overhead during computation. Here, we introduce Genotype Blocking Compressor (GBC), a method for rapidly compressing large-scale genotypes into a fast-accessible and highly parallelizable format. We demonstrate that GBC has a competitive compression ratio to help save storage space. Furthermore, GBC is the fastest method to access and manage compressed large-scale genotype files (sorting, merging, splitting, etc.). Our results indicate that GBC can help resolve the fundamental problem of time- and space-consuming computation with large-scale genotypes, and conventional analysis would be substantially enhanced if integrated with GBC to access genotypes. Therefore, GBC's advanced data structure and algorithms will accelerate future population-based biomedical research involving big genomics data.
更多
查看译文
关键词
genotypes,accessible byte blocks,compression,ukbb-scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要