Open dataset discovery using context-enhanced similarity search

KNOWLEDGE AND INFORMATION SYSTEMS(2022)

引用 1|浏览17
暂无评分
摘要
Today, open data catalogs enable users to search for datasets with full-text queries in metadata records combined with simple faceted filtering. Using this combination, a user is able to discover a significant number of the datasets relevant to a user’s search intent. However, there still remain relevant datasets that are hard to find because of the enormous sparsity of their metadata (e.g., several keywords). As an alternative, in this paper, we propose an approach to dataset discovery based on similarity search over metadata descriptions enhanced by various semantic contexts. In general, the semantic contexts enrich the dataset metadata in a way that enables the identification of additional relevant datasets to a query that could not be retrieved using just the keyword or full-text search. In experimental evaluation we show that context-enhanced similarity retrieval methods increase the findability of relevant datasets, improving thus the retrieval recall that is critical in dataset discovery scenarios. As a part of the evaluation, we created a catalog-like user interface for dataset discovery and recorded streams of user actions that served us to create the ground truth. For the sake of reproducibility, we published the entire evaluation testbed.
更多
查看译文
关键词
Dataset,Discovery,Search,Similarity,Evaluation,Context
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要