A distantly supervised approach for recognizing product mentions in user-generated content

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS(2022)

引用 0|浏览9
暂无评分
摘要
As online purchasing becomes more popular, users trust more information published on social media than on advertisement content. Opinion mining is often applied to social media, and opinion target extraction is one of its main sub-tasks. In this paper, we focus on recognizing target entities related to electronic products. We propose a method called ProdSpot, for training a named entity extractor to identify product mentions in user text based on the distant supervision paradigm. ProdSpot relies only on an unlabeled set of product offer titles and a list of product brand names. Initially, surface forms are identified from product titles. Given a collection of user posts, our method selects sentences that contain at least one surface form to be automatically labeled. A cluster-based filtering strategy is applied to detect and filter out possible mislabelled sentences. Finally, data augmentation is used to produce more general and diverse training. The set of augmented sentences constitutes the training set to train a recognition model. Experiments demonstrate that the training data automatically generated yields results similar to those achieved by a supervised model. Our best result for precision is only 9% lower than a supervised model, while our recall level is higher by approximately 7% in two distinct product categories. Compared to a state-of-the-art supervised method specifically designed to recognize mobile phone names, our method achieved competitive results with F1 values only 4% lower while not requiring user supervision. Our filtering and data augmentation steps directly influence these results.
更多
查看译文
关键词
Information extraction,Named entity recognition,Opinion mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要