谷歌浏览器插件
订阅小程序
在清言上使用

Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention

CoRR(2020)

引用 0|浏览28
暂无评分
摘要
Accurate and efficient product classification is significant for E-commerce applications, as it enables various downstream tasks such as recommendation, retrieval, and pricing. Items often contain textual and visual information, and utilizing both modalities usually outperforms classification utilizing either mode alone. In this paper we describe our methodology and results for the SIGIR eCom Rakuten Data Challenge. We employ a dual attention technique to model image-text relationships using pretrained language and image embeddings. While dual attention has been widely used for Visual Question Answering(VQA) tasks, ours is the first attempt to apply the concept for multimodal classification.
更多
查看译文
关键词
large scale multimodal classification,ensemble,transformer models,co-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要