Things Change: Comparing Results Using Historical Data and User Testing for Evaluating a Recommendation Task

Soon-Gyo Jung,Joni Salminen,Shammur A. Chowdhury,Dianne Ramirez Robillos,Bernard J. Jansen

CHI '20: CHI Conference on Human Factors in Computing Systems Honolulu HI USA April, 2020（2020）

引用 4|浏览71

暂无评分

摘要

We address a recommendation task for next likely flight destination to customers of a major international airline company. We compare performance using historical flight data and an actual user evaluation. Using two years of historical flight data consisting of tens of millions of flights, an ensemble and a collaborative filtering approach obtained an accuracy of 47% and 20% using a test set of 100,000 customers, respectively, highlighting the challenge of the domain. We then evaluated our recommendations on 10,000 actual customers, with a 45-45-10 split among ensemble, collaborative filtering, and control group. The overall predictive power employed with real users was 23%, with the ensemble method having a predictive power of 19% and 30% for collaborative filtering. Results indicate that, in complex and shifting domains such as this one, one cannot rely solely on historical data for evaluating the impact of user recommendations. We discuss implications for recommendation systems and future research in this and related domains.

查看译文

关键词

Prediction, Recommendations, Algorithmic trade-off, User Study

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要