The effect of sample size on polygenic hazard models for prostate cancer

Roshan A. Karunamuni,Minh-Phuong Huynh-Le,Chun C. Fan,Rosalind A. Eeles,Douglas F. Easton,ZSofia Kote-Jarai,Ali Amin Al Olama,Sara Benlloch Garcia,Kenneth Muir,Henrik Gronberg,Fredrik Wiklund,Markus Aly,Johanna Schleutker,Csilla Sipeky,Teuvo L. J. Tammela,Børge G. Nordestgaard,Tim J. Key,Ruth C. Travis,David E. Neal,Jenny L. Donovan,Freddie C. Hamdy,Paul Pharoah,Nora Pashayan,Kay-Tee Khaw,Stephen N. Thibodeau,Shannon K. McDonnell,Daniel J. Schaid,Christiane Maier,Walther Vogel,Manuel Luedeke,Kathleen Herkommer,Adam S. Kibel,Cezary Cybulski,Dominika Wokolorczyk,Wojciech Kluzniak,Lisa Cannon-Albright,Hermann Brenner,Ben Schöttker,Bernd Holleczek,Jong Y. Park,Thomas A. Sellers,Hui-Yi Lin,Chavdar Slavov,Radka Kaneva,Vanio Mitev,Jyotsna Batra,Judith A. Clements,Amanda Spurdle,Manuel R. Teixeira,Paula Paulo,Sofia Maia,Hardev Pandha,Agnieszka Michael,Ian G. Mills,Ole A. Andreassen,Anders M. Dale,Tyler M. Seibert

EUROPEAN JOURNAL OF HUMAN GENETICS（2020）

引用 13|浏览110

暂无评分

摘要

We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR 98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR 98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR 98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.

查看译文

关键词

Genetics research,Risk factors,Biomedicine,general,Human Genetics,Bioinformatics,Gene Expression,Cytogenetics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要