Topological stratification of continuous genetic variation in large biobanks

Alex Diaz-Papkovich,Shadi Zabad,Chief Ben-Eghan,Luke Anderson-Trocmé, Georgette Femerling, Vikram Nathan, Jenisha Patel,Simon Gravel

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览5
Biobanks now contain genetic data from millions of individuals. Dimension-ality reduction, visualization and stratification are standard when exploring data at these scales; while efficient and tractable methods exist for the first two, stratification remains challenging because of uncertainty about sources of population structure. In practice, stratification is commonly performed by drawing shapes around dimensionally reduced data or assuming populations have a “type” genome. We propose a method of stratifying data with topo-logical analysis that is fast, easy to implement, and integrates with existing pipelines. The approach is robust to the presence of sub-populations of vary-ing sizes and wide ranges of population structure patterns. We demonstrate its effectiveness on genotypes from three biobanks and illustrate how topolog-ical genetic strata can help us understand structure within biobanks, evaluate distributions of genotypic and phenotypic data, examine polygenic score trans-ferability, identify potential influential alleles, and perform quality control.
continuous genetic variation,large biobanks,topological stratification
AI 理解论文
Chat Paper