False Discovery in A/B Testing

SSRN Electronic Journal(2020)

引用 0|浏览0
暂无评分
摘要
We investigate what fraction of all significant results in website A/B testing are actually null effects, i.e., the false discovery rate (FDR). Our data consists of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance, and between 18% and 25% for tests at 5% significance (two-sided). These high FDRs stem mostly from the high fraction of true-null effects, about 70%, rather than from low power. Using our estimates we also assess the potential of various A/B test designs to reduce the FDR. The two main implications are that decision makers should expect 1 in 5 interventions achieving significance at 5% confidence to be ineffective when deployed in the field, and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.
更多
查看译文
关键词
testing,discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要