Multimodal Joint Attribute Prediction and Value Extraction for E commerce Product

EMNLP 2020, pp. 2129-2139, 2020.

Other Links: arxiv.org|academic.microsoft.com
Weibo:
We jointly tackle the tasks of e-commerce product attribute prediction and value extraction from multiple aspects towards the relationship between product attributes and values, and we prove that the models can benefit a lot from visual product information

Abstract:

Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product recommendations, and product retrieval. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we propose a multimodal...More

Code:

Data:

0
Introduction
  • While product attribute values are pervasively incomplete for a massive number of products on the e-commerce platform.
  • According to the statistics on a mainstream e-commerce platform in China, there are over 40 attributes for the products in clothing category, but the average count of attributes present for each product is fewer than 8.
  • The authors propose a method to jointly predict product attributes and extract the corresponding values with multimodal product information, as shown in Figure 1.
  • Attributes and values are, known to strongly depend on each other, and vision can play a essential role for this task
Highlights
  • Product attribute values that provide details of the product are crucial parts of e-commerce, which help customers to make purchasing decisions and facilitate retailers on many applications, such as question answering system (Yih et al, 2015; Yu et al, 2017), product recommendations (Gong,

    “ This golden lapel shirt can be dressed up with black shoes ...... ” Attribute Value

    Collar Type lapel Color golden

    2009; Cao et al, 2018), and product retrieval (Liao et al, 2018; Magnani et al, 2019)
  • While product attribute values are pervasively incomplete for a massive number of products on the e-commerce platform
  • We evaluate our model on two subtasks, including attribute prediction and value extraction
  • The main results in Table 3 show that our proposed M-JAVE w/o Visual Info (JAVE) model based on the BERT and the Bidirectional LSTM (BiLSTM) both outperform the baselines significantly, which proves an excellent generalization ability of our methods
  • We jointly tackle the tasks of e-commerce product attribute prediction and value extraction from multiple aspects towards the relationship between product attributes and values, and we prove that the models can benefit a lot from visual product information
  • The experimental results show that the correlations between product attributes and values are valuable for this task, and visual information should be selectively used
Results
  • The authors evaluate the model on two subtasks, including attribute prediction and value extraction.
  • The main results in Table 3 show that the proposed M-JAVE model based on the BERT and the Bidirectional LSTM (BiLSTM) both outperform the baselines significantly, which proves an excellent generalization ability of the methods.
  • From the results of the proposed M-JAVE Shoes Bags Luggage Dresses Boots Pants Total.
  • F1 (%) for Attribute Prediciton.
  • F1 (%) for Value Extraction.
  • Model MAE-model M-JAVE (LSTM) M-JAVE (BERT) MAE 59.48
Conclusion
  • The authors jointly tackle the tasks of e-commerce product attribute prediction and value extraction from multiple aspects towards the relationship between product attributes and values, and the authors prove that the models can benefit a lot from visual product information.
  • The experimental results show that the correlations between product attributes and values are valuable for this task, and visual information should be selectively used
Summary
  • Introduction:

    While product attribute values are pervasively incomplete for a massive number of products on the e-commerce platform.
  • According to the statistics on a mainstream e-commerce platform in China, there are over 40 attributes for the products in clothing category, but the average count of attributes present for each product is fewer than 8.
  • The authors propose a method to jointly predict product attributes and extract the corresponding values with multimodal product information, as shown in Figure 1.
  • Attributes and values are, known to strongly depend on each other, and vision can play a essential role for this task
  • Results:

    The authors evaluate the model on two subtasks, including attribute prediction and value extraction.
  • The main results in Table 3 show that the proposed M-JAVE model based on the BERT and the Bidirectional LSTM (BiLSTM) both outperform the baselines significantly, which proves an excellent generalization ability of the methods.
  • From the results of the proposed M-JAVE Shoes Bags Luggage Dresses Boots Pants Total.
  • F1 (%) for Attribute Prediciton.
  • F1 (%) for Value Extraction.
  • Model MAE-model M-JAVE (LSTM) M-JAVE (BERT) MAE 59.48
  • Conclusion:

    The authors jointly tackle the tasks of e-commerce product attribute prediction and value extraction from multiple aspects towards the relationship between product attributes and values, and the authors prove that the models can benefit a lot from visual product information.
  • The experimental results show that the correlations between product attributes and values are valuable for this task, and visual information should be selectively used
Tables
  • Table1: Statistics of the our dataset
  • Table2: Details about hyper-parameters
  • Table3: Main results (F1 score %) of comparative methods and variants of our model
  • Table4: Experimental results (accuracy %) of our proposed model and MAE baseline model (MAE-model)
  • Table5: Experimental results (F1 score %) for ablation study on the relationship between attributes and values. “UpBound” denotes “Upper Bound”
  • Table6: Experimental results (F1 score %) for ablation study on the product images
  • Table7: F1 scores in the Congruent and Incongruent settings, along with the Meteor-awareness results. Incongruent and ∆Awareness scores are the mean and standard deviation of 8 permutations of product images in test dataset
  • Table8: Experimental results (F1 score %) for domain adaptation. ∆↓ denotes the F1 score gap for the PI and QA domains
  • Table9: Results (mean and standard deviation) with different sizes of of training data
Download tables as Excel
Funding
  • We use the F1 score to calculate awareness score for a single instance: aM = F1(xi, yi, vi) − F1(xi, yi, vi) (16)
  • We use K = 8 separate p values from each test based on Fisher’s method, and get X 2=6790.80, p <0.0001 in product attribute prediction and X 2=780.80, p <0.0001 in value extraction, which proves that the incongruent image significantly degrades the model’s performance
Reference
  • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 6077–6086.
    Google ScholarLocate open access versionFindings
  • Daniel M. Bikel, Richard M. Schwartz, and Ralph M. Weischedel. 1999. An algorithm that learns what’s in a name. Mach. Learn., 34(1-3):211–231.
    Google ScholarLocate open access versionFindings
  • Min Cao, Sijing Zhou, Honghao Gao, and Youhuizi Li. 2018. A novel hybrid collaborative filtering approach to recommendation using reviews: The product attribute perspective (S). In The 30th International Conference on Software Engineering and Knowledge Engineering, Hotel Pullman, Redwood City, California, USA, July 1-3, 2018, pages 7–10.
    Google ScholarLocate open access versionFindings
  • Rich Caruana. 1997. Multitask learning. Machine learning, 28(1):41–75.
    Google ScholarLocate open access versionFindings
  • Qian Chen, Zhu Zhuo, and Wen Wang. 2019. BERT for joint intent classification and slot filling. CoRR, abs/1902.10909.
    Findings
  • Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537.
    Google ScholarLocate open access versionFindings
  • Chih-Wen Goo, Guang Gao, Yun-Kai Hsu, Chih-Li Huo, Tsung-Chieh Chen, Keng-Wei Hsu, and YunNung Chen. 2018. Slot-gated modeling for joint slot filling and intent prediction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 753–757, New Orleans, Louisiana.
    Google ScholarLocate open access versionFindings
  • Dilek Hakkani-Tur, Gokhan Tur, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, and YeYi Wang. 2016. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pages 715–719.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778.
    Google ScholarLocate open access versionFindings
  • Robert L. Logan IV, Samuel Humeau, and Sameer Singh. 2017. Multimodal attribute extraction. In 6th Workshop on Automated Knowledge Base Construction, AKBC@NIPS 2017, Long Beach, California, USA, December 8, 2017.
    Google ScholarLocate open access versionFindings
  • Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255.
    Google ScholarLocate open access versionFindings
  • Haoran Li, Peng Yuan, Song Xu, Youzheng Wu, Xiaodong He, and Bowen Zhou. 2020. Aspect-aware multimodal summarization for chinese e-commerce products. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, pages 8188– 8195.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 4171–4186, Minneapolis, Minnesota.
    Google ScholarLocate open access versionFindings
  • Haoran Li, Junnan Zhu, Tianshang Liu, Jiajun Zhang, and Chengqing Zong. 2018. Multi-modal sentence summarization with modality attention and image filtering. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pages 4152–4158.
    Google ScholarLocate open access versionFindings
  • Desmond Elliott. 2018. Adversarial evaluation of multimodal machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 2974–2978.
    Google ScholarLocate open access versionFindings
  • Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema, and Andrew E. Fano. 2006. Text mining for product attribute extraction. SIGKDD Explorations, 8(1):41–48.
    Google ScholarLocate open access versionFindings
  • SongJie Gong. 2009. Employing user attribute and item attribute to enhance the collaborative filtering recommendation. JSW, 4(8):883–890.
    Google ScholarLocate open access versionFindings
  • Haoran Li, Junnan Zhu, Cong Ma, Jiajun Zhang, and Chengqing Zong. 2017. Multi-modal summarization for asynchronous collection of text, image, audio and video. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1092–1102, Copenhagen, Denmark.
    Google ScholarLocate open access versionFindings
  • Haoran Li, Junnan Zhu, Cong Ma, Jiajun Zhang, and Chengqing Zong. 2019.
    Google ScholarFindings
  • Lizi Liao, Xiangnan He, Bo Zhao, Chong-Wah Ngo, and Tat-Seng Chua. 2018. Interpretable multimodal retrieval for fashion products. In 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018, pages 1571–1579.
    Google ScholarLocate open access versionFindings
  • Bing Liu and Ian Lane. 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pages 685–689.
    Google ScholarLocate open access versionFindings
  • Xiaojing Liu, Feiyu Gao, Qiong Zhang, and Huasha Zhao. 2019. Graph convolution for multimodal information extraction from visually rich documents. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pages 32–39, Minneapolis, Minnesota.
    Google ScholarLocate open access versionFindings
  • Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 289–297.
    Google ScholarLocate open access versionFindings
  • Alessandro Magnani, Feng Liu, Min Xie, and Somnath Banerjee. 2019. Neural product retrieval at walmart.com. In Companion of The 2019 World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, pages 367–372.
    Google ScholarFindings
  • Ajinkya More. 2016. Attribute extraction from product titles in ecommerce. CoRR, abs/1608.04670.
    Findings
  • Duangmanee Putthividhya and Junling Hu. 2011. Bootstrapped named entity recognition for product attribute extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1557–1567, Edinburgh, Scotland, UK.
    Google ScholarLocate open access versionFindings
  • Keiji Shinzato and Satoshi Sekine. 2013. Unsupervised extraction of attributes and their values from product description. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 1339–1347, Nagoya, Japan.
    Google ScholarLocate open access versionFindings
  • Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: pretraining of generic visual-linguistic representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.
    Google ScholarLocate open access versionFindings
  • Hao Tan and Mohit Bansal. 2019. LXMERT: Learning cross-modality encoder representations from transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5100–5111, Hong Kong, China.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144.
    Findings
  • Huimin Xu, Wenting Wang, Xin Mao, Xinyu Jiang, and Man Lan. 2019. Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5214–5223, Florence, Italy.
    Google ScholarLocate open access versionFindings
  • Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. 2015. Semantic parsing via staged query graph generation: Question answering with knowledge base. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1321–1331, Beijing, China.
    Google ScholarLocate open access versionFindings
  • Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2017. Improved neural relation detection for knowledge base question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 571– 581, Vancouver, Canada.
    Google ScholarLocate open access versionFindings
  • Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular co-attention networks for visual question answering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 6281–6290.
    Google ScholarLocate open access versionFindings
  • Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, and Feifei Li. 2018. Opentag: Open attribute value extraction from product profiles. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, pages 1049– 1058.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments