A Dataset And Baselines For E-Commerce Product Categorization

PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19)(2019)

引用 4|浏览28
暂无评分
摘要
We make available a document collection of a million product titles from 3, 008 anonymized categories of the rakuten.com product catalog. The anonymization has been done due to intellectual property rights on the underlying data organization taxonomy. Our analysis of the characteristics of the 800, 000 training and 20, 000 validation titles show that they match the test set of 180, 000 titles. Twenty six independent teams participated in an automatic product categorization challenge on this dataset. We present results and analysis and suggest strong baselines for this collection and task.
更多
查看译文
关键词
Document Collection, e-Commerce, Taxonomy Categorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要