Linear Assembly of a Human Y Centromere using Nanopore Long Reads

bioRxiv(2018)

引用 9|浏览32
暂无评分
摘要
The human genome reference sequence remains incomplete due to the challenge of assembling long tracts of near-identical tandem repeats, or satellite DNAs, that are highly enriched in centromeric regions. Efforts to resolve these regions capitalize on a small number of sparsely arranged sequence variants that offer unique markers to break the repeat monotony and ensure proper overlap-layout-consensus assembly DNAs. Identifying and spanning sequence variants that may be spaced hundreds of kilobases away within a given array requires long and highly accurate sequence reads. Achieving this requires an advancement in standard single-molecule sequencing, which to date has been error-prone and offers a low throughput of sufficiently long-reads (100 kb+). Here we present a strategy that generates long-reads capable of spanning the complete sequence insert of bacterial artificial chromosomes (BACs) that are hundreds of kilobases in length (~100-300kb). We demonstrate that these reads are sufficient to resolve the linear ordering of repeats within a single satellite array on the Y chromosome, allowing the first complete sequence characterization of a human centromere.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要