Text-Based Product Matching – Semi-Supervised Clustering Approach
CoRR(2024)
摘要
Matching identical products present in multiple product feeds constitutes a
crucial element of many tasks of e-commerce, such as comparing product
offerings, dynamic price optimization, and selecting the assortment
personalized for the client. It corresponds to the well-known machine learning
task of entity matching, with its own specificity, like omnipresent
unstructured data or inaccurate and inconsistent product descriptions. This
paper aims to present a new philosophy to product matching utilizing a
semi-supervised clustering approach. We study the properties of this method by
experimenting with the IDEC algorithm on the real-world dataset using
predominantly textual features and fuzzy string matching, with more standard
approaches as a point of reference. Encouraging results show that unsupervised
matching, enriched with a small annotated sample of product links, could be a
possible alternative to the dominant supervised strategy, requiring extensive
manual data labeling.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要