Attribute Selection by Measuring Information on Reference Distributions

semanticscholar(2014)

引用 1|浏览3
暂无评分
摘要
A great number of services, experiments, and decisions at Yahoo! require analyzing rich data sources. This data almost invariably holds a large number of attributes. In these scenarios, the efficient selection of relevant attributes is imperative for data analysis (e.g., modeling, prediction). When approaching new data analysis tasks, domain experts, researchers, and engineers spent a considerable amount of resources identifying (manually or semi-automatically) these relevant attributes. This paper attempts to address this problem by providing a simple and largely automated attribute selection approach. The method is based on reformulating the mutual information (MI) measure. We show why MI cannot in general be used effectively without considerable domain expertise and describe a more appropriate measure that allows for a much larger level of automation (removing considerable manual work from the analysis loop). Experiments on the tasks of predicting clicks and conversions for Yahoo! display advertising platform in the context of the NGDStone project show the effectiveness of the proposed approach.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要