Stratified Subsampling Based p-values for Hypothesis Tests in Genomics Research

STATISTICS AND APPLICATIONS(2021)

引用 0|浏览12
暂无评分
摘要
Multiple testing, which refers to testing of more than one hypothesis in an experiment, is routinely performed in statistical analysis of genome-wide data, such as testing the association of single-nucleotide polymorphisms (SNPs) with a particular phenotype. A common practice is application of multiple-testing correction methods to exclude candidate SNPs that could otherwise be spuriously marked as statistically significant. However, in many cases such methods are overly conservative and often result in no significant SNPs at all. In this paper, we summarize commonly used multiple-testing correction procedures and Monte Carlo simulation-based methods. We propose a simple modification to subsampling-based simulation method to estimate empiricalp-values by borrowing the principles of stratified sampling. Using real datasets from the cancer genome atlas (TCGA) data repository, we demonstrate that the traditional multiple testing correction methods yielded almost none or very few significant risks associated SNPs, whereas the proposed stratified subsampling successfully resulted in appropriate number of significant candidate SNPs. We also show that the proposed modification has provided meaningful p-values and made the test more powerful as compared to simple sub sampling without stratification.
更多
查看译文
关键词
Multiple comparison test, Subsampling, Stratified sampling, p-value
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要