The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions

MOLECULAR PLANT(2023)

引用 3|浏览32
暂无评分
摘要
Potato is a vital food security crop and is ranked as the world’s third most important food crop after rice and wheat. In 2011, the first genome assembly of a doubled monoploid potato DM1-3 516 R44 (DM) was released (Potato Genome Sequencing Consortium, 2011Potato Genome Sequencing ConsortiumGenome sequence and analysis of the tuber crop potato.Nature. 2011; 475: 189-195Crossref PubMed Scopus (1511) Google Scholar), which has been widely used as one of the most popular reference genomes in the last decade and served as a valuable resource in plant genomics and potato genetics community (Leisner et al., 2018Leisner C.P. Hamilton J.P. Crisovan E. Manrique-Carpintero N.C. Marand A.P. Newton L. Pham G.M. Jiang J. Douches D.S. Jansky S.H. et al.Genome sequence of M6, a diploid inbred clone of the high-glycoalkaloid-producing tuber-bearing potato species Solanum chacoense, reveals residual heterozygosity.Plant J. 2018; 94: 562-570Crossref PubMed Scopus (91) Google Scholar; Yang et al., 2020Yang X. Yang Y. Ling J. Guan J. Guo X. Dong D. Jin L. Huang S. Liu J. Li G. A high-throughput BAC end analysis protocol (BAC-anchor) for profiling genome assembly and physical mapping.Plant Biotechnol. J. 2020; 18: 364-372Crossref PubMed Scopus (3) Google Scholar; Zheng et al., 2020Zheng J. Yang Y. Guo X. Jin L. Xiong X. Yang X. Li G. Exogenous SA initiated defense response and multi-signaling pathway in tetraploid potato SD20.Horticultural Plant Journal. 2020; 6: 99-110Crossref Scopus (16) Google Scholar). The latest version of DM genome assembly (v6.1) (Pham et al., 2020Pham G.M. Hamilton J.P. Wood J.C. Burke J.T. Zhao H. Vaillancourt B. Ou S. Jiang J. Buell C.R. Construction of a chromosome-scale long-read reference genome assembly for potato.GigaScience. 2020; 9: giaa100-giaa111Crossref PubMed Scopus (96) Google Scholar) served as a good reference and quality control in studies of diploid and tetraploid potatoes (Zhou et al., 2020Zhou Q. Tang D. Huang W. Yang Z. Zhang Y. Hamilton J.P. Visser R.G.F. Bachem C.W.B. Robin Buell C. Zhang Z. et al.Haplotype-resolved genome analyses of a heterozygous diploid potato.Nat. Genet. 2020; 52: 1018-1023Crossref PubMed Scopus (96) Google Scholar; Bao et al., 2022Bao Z. Li C. Li G. Wang P. Peng Z. Cheng L. Li H. Zhang Z. Li Y. Huang W. et al.Genome architecture and tetrasomic inheritance of autotetraploid potato.Mol. Plant. 2022; 15: 1211-1226Abstract Full Text Full Text PDF PubMed Scopus (11) Google Scholar; Hoopes et al., 2022Hoopes G. Meng X. Hamilton J.P. Achakkagari S.R. de Alves Freitas Guesdes F. Bolger M.E. Coombs J.J. Esselink D. Kaiser N.R. Kodde L. et al.Phased, chromosome-scale genome assemblies of tetraploid potato reveal a complex genome, transcriptome, and predicted proteome landscape underpinning genetic diversity.Mol. Plant. 2022; 15: 520-536Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar; Sun et al., 2022Sun H. Jiao W.B. Krause K. Campoy J.A. Goel M. Folz-Donahue K. Kukat C. Huettel B. Schneeberger K. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar.Nat. Genet. 2022; 54: 342-348Crossref PubMed Scopus (45) Google Scholar; Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar). However, 161 gaps remain in DM6.1 (v6.1), and the centromere and telomere structures are incomplete. Considering the importance of the DM genome in potato genomics, genetics, and breeding studies, generating a complete genome assembly of DM is of great importance. In this study, a telomere-to-telomere gap-free genome of DM (DM8.1) (Figure 1A) was assembled through combining Oxford Nanopore Technologies (ONT) ultra-long reads sequencing (119.81× coverage) and Hi-C sequencing (130.57×) (Supplemental Table 1), as well as being assisted by multiple gap-closing strategies coupled with high fidelity (HIFI) reads from circular consensus sequencing. A total of 179 contigs with a summed size of 773.36 Mb and a contig N50 of 59.72 Mb were obtained after initial genome assembly, polishing, and decontamination. Hi-C reads further anchored 37 of the 179 contigs into 12 chromosomes (Supplemental Figure 1; Supplemental Table 2), accounting for 95.53% (738.82 Mb) of the total assembly, and we named it preDM8. For the 142 (34.53 Mb) unanchored contigs, over 98% are short sequences (<1 Mb), and all could be aligned to chromosomes with high similarity, indicating that these were repetitive or redundant sequences. The preDM8 has better contiguous sequences than DM6.1 and the potato pan-genome assemblies (Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar) (Supplemental Figure 2). However, there were 25 gaps in preDM8. Three methods were further adopted to close these gaps (Supplemental Figure 3A; Supplemental Table 3). First, we aligned the ONT reads to preDM8, and reads mapped on the flanking regions of gaps were collected and assembled, which successfully closed 14 gaps. Second, based on the syntenic homologous fragments between preDM8 and DM6.1, three gaps were closed with the DM6.1 consecutive sequences that covered these gaps in preDM8. Third, target sequences amplification experiments (Supplemental Figure 3B) and HIFI sequencing were performed, which successfully closed the remaining eight gaps (Supplemental Figures 3C and 4). Finally, we generated the gap-free genome assembly of DM and named it DM8.1 (Figure 1A; Supplemental Table 4). To verify the quality of the gap-free genome, we investigated the reliability of these sequences in DM8.1 that corresponded to the 161 gaps in DM6.1. We randomly selected 50 of the 161 gaps and designed 100 pairs of primers (Supplemental Table 5) based on sequences on both sides of these closed gaps for PCR amplification (Supplemental Figure 5) and Sanger sequencing. Both the 5′ and 3′ boundary sequences of these gaps were successfully obtained, which indicated the high accuracy of DM8.1. Meanwhile, DM8.1 genome achieved a BUSCO value of 98.70%, an extremely high mapping rate (>99.90%) of both Illumina short reads and ONT long reads; a high consensus quality value (35.85) obtained by Merqury analysis; and improvement in long terminal repeat (LTR)-retrotransposon completeness (DM8.1: LAI = 12.92, LTR length = 388.58 Mb; DM6.1: LAI = 12.75, LTR length = 375.91 Mb), further supporting the high quality of DM8.1 (Supplemental Tables 6 and 7). A total of 40 155 protein-coding genes were predicted in DM8.1 (Supplemental Figure 6), among which 33 972 (84.60%) were functionally annotated and 24 362 genes were expressed, estimated by the 10 mRNA sequencing datasets. Further analysis found that there were 1117 genes in DM8.1 that were mis-annotated in DM6.1 in that one gene was incorrectly annotated as two. These errors were revealed by individual read pairs (mRNA sequencing) covering and linking two mis-annotated neighbor genes, suggesting that they were from a transcript of one gene (Supplemental Figure 7). Meanwhile, a total of 956 349 transposable elements (TEs) were predicted, accounting for 60.31% (465.81 Mb) of the DM8.1 genome (Supplemental Figure 8; Supplemental Table 8). Additionally, there were 4676 small RNAs predicted in DM8.1 (Supplemental Figure 9). All telomere regions were detected in DM8.1 using the seven-base telomeric repeat and sub-telomeric repeats of CL14 and CL34, and all centromere regions were identified using CENH3 (Figure 1A). Sequence composition analysis showed that the centromere regions contained more Gypsy-type LTRs (49.25%), while the telomere regions harbored more unknown TEs (Supplemental Figure 8). Additionally, the filled sequences in these 25 gaps showed similar TE contents to the centromere regions (Supplemental Figure 8). The complete genome assembly of DM8.1 facilitated the identification of large tandem gene clusters of functional importance. A total of 181 genes were identified in these newly assembled sequences, corresponding to the 161 gap regions in DM6.1. Of these 181 genes, three large clusters (>15 copies) of tandem duplicated genes were found, including 21 patatin genes (Figures 1B), 31 terpene synthase genes, and 18 cupin genes (Supplemental Figure 10). Among them, the 21 patatin genes showed much higher expression levels in tubers than in other organs of potato (Figure 1C). Intriguingly, patatin was found to be under absolute dosage selection, because it has been continuously expanded during the evolution, domestication, and breeding improvement of potato (Figures 1D–1E). In family Solanaceae, we found that patatin was only largely expanded in potato and a bit expanded in wolfberry (seven copies) while keeping three or fewer copies in others or was even completely lost in Physalis and tobacco (Figure 1E). Additionally, Etuberosum, which is a sister group of potato, has four and five copies of patatin in the two assembled Etuberosum genomes (Figure 1D). This indicates that expansion of patatin gene copies is associated with the speciation of potato, which may play an important role in the formation of enlarged tubers in potato. Furthermore, in the reported pan-genomes of tomato and potato (Tang et al., 2022Tang D. Jia Y. Zhang J. Li H. Cheng L. Wang P. Bao Z. Liu Z. Feng S. Zhu X. et al.Genome evolution and diversity of wild and cultivated potatoes.Nature. 2022; 606: 535-541Crossref PubMed Scopus (62) Google Scholar; Zhou et al., 2022Zhou Y. Zhang Z. Bao Z. Li H. Lyu Y. Zan Y. Wu Y. Cheng L. Fang Y. Wu K. et al.Graph pangenome captures missing heritability and empowers tomato breeding.Nature. 2022; 606: 527-534Crossref PubMed Scopus (60) Google Scholar), we found that the locus of patatin maintained only one or two gene copies in the tomato population but was expanded continuously and significantly in the potato population from the diploid wild potato, diploid S. candolleanum, to the diploid landraces of potato, with the average copy number growing from 5.9 and 7 to 14.6, respectively (Figure 1D), clearly indicating the expansion of patatin during the domestication of potato. Moreover, these expanded patatin genes were under strong positive selection (Ka/Ks > 1), especially in these domesticated potato genomes (Supplemental Figure 11), indicating the functional differentiation of patatin after gene copy expansion, which may associate with the development, production, and quality improvement of potato tubers. These findings together suggest that it is possible to breed potato cultivars of higher yields and quality through manipulating the absolute dosage, i.e., the gene copy number or the expression level, of patatin. There have been continuous efforts to improve the reference genome of DM, which is important for both scientific research and breeding programs of potato. In this study, we have generated the gap-free telomere-to-telomere genome assembly of DM8.1, which could serve as an important resource for future genomics and gene function studies in potato. This work was supported by the National Natural Science Foundation of China (32072119 and 31801421); the Breeding Program of Shandong Province, China (2020LZGC003); the National Agriculture Science and Technology Major Program, China (NK20220904); the China Agricultural Research System (CARS-9); the Central Public-interest Scientific Institution Basal Research Fund (Y2022PT23); and the Innovation Program of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-IVFCAAS).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要