Finding bias, and making do with data that have it

KDD(2011)

引用 1|浏览3
暂无评分
摘要
ABSTRACTMuch of statistics and machine learning relies on random sampling and designed experiments, but sometimes the only data that we can have were obtained by other means. The data may be just whatever is available in transaction logs. Or, an advertiser may use data on everyone exposed to an ad campaign and a random sample of people who were not exposed to a campaign ad to measure ad effectiveness. Such data may give badly biased estimates. However, even perfect random sampling can give flawed estimates. The time spent in hospital for a random sample taken from patients in a hospital on a given day will overestimate duration of hospital stay, even though the patients are randomly chosen. This tutorial will describe some sources of selection bias, some ways to detect when it is so overwhelming that no valid estimate is possible, and some strategies that can sometimes be used to dilute the influence of selection bias on estimates.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要