AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We demonstrate that the entities bring additional exact match and so match ranking signals from the knowledge graph; all entity-based rankings perform similar or be er compared to solely word-based rankings

Word-Entity Duet Representations for Document Ranking.

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Informatio..., (2017): 763-772

Cited by: 70|Views78
EI

Abstract

This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval. In this work, the query and documents are modeled by word-based representations and entity-based representations. Ranking features are generated by the interactions between the two representations, incorporating information from the word sp...More

Code:

Data:

0
Introduction
  • Utilizing knowledge bases in text-centric search is a recent breakthrough in information retrieval [5]. e rapid growth of information extraction techniques and community e orts have generated large scale general domain knowledge bases, such as DBpedia and Freebase. ese knowledge bases store rich semantics in semistructured formats and have great potential in improving text understanding and search accuracy.

    ere are many possible ways to utilize knowledge bases’ semantics in di erent components of a search system. ery representation can be improved by introducing related entities and their texts to expand the query [4, 20].
  • Utilizing knowledge bases in text-centric search is a recent breakthrough in information retrieval [5].
  • E ranking model can be improved by utilizing the entities and their a ributes to build additional connections between query and documents [14, 19].
  • E rich and novel ranking evidence from the word-entity duet does come with a cost
  • Because it is created automatically, the entity-based representation introduces uncertainties.
  • Trained directly from relevance judgments, AttR-Duet learns how to demote noisy entities and how to rank documents with the word-entity duet simultaneously
Highlights
  • Utilizing knowledge bases in text-centric search is a recent breakthrough in information retrieval [5]. e rapid growth of information extraction techniques and community e orts have generated large scale general domain knowledge bases, such as DBpedia and Freebase. ese knowledge bases store rich semantics in semistructured formats and have great potential in improving text understanding and search accuracy.

    ere are many possible ways to utilize knowledge bases’ semantics in di erent components of a search system. ery representation can be improved by introducing related entities and their texts to expand the query [4, 20]
  • An entity can be mistakenly annotated to a query, and may mislead the search system. is paper develops an a ention-based ranking model, AttR-Duet, that employs a simple a ention mechanism to handle the noise in the entity representation. e matching component of AttR-Duet focuses on ranking with the word-entity duet, while its a ention component focuses on steering the model away from noisy entities
  • To handle the uncertainty introduced from the automatic-thusnoisy entity representations, a new ranking model AttR-Duet is developed
  • Further experiments reveal that the strength of the method comes from both the advanced matching evidence from the word-entity duet, and the a ention mechanism that successfully ‘puri es’ them
  • Our method provides a uni ed representation framework to utilize knowledge graphs in information retrieval
  • The recent approaches in neural ranking with word embeddings can be incorporated [9]; be er knowledge graph embeddings can be used [13]; be er entity search methods can be applied when extracting word to entity features [3]; the a ention mechanism can be extended to document’s entity-based representations
Methods
  • On ClueWeb09-B, a widely studied benchmark for web search, AttR-Duet improved RankSVM, a strong learning to rank baseline, by more than 20% at NDCG@20, and more than 30% at ERR@20, showing the advantage of the word-entity duet over bag-of-words.
  • ESR, EQFE and EsdRank, previous state-of-the-art entity-based ranking methods, were outperformed by at least 15%.
  • It is not surprising because the word-entity duet framework was designed to include all of their e ects, as discussed at Section 3.3.
  • AttR-Duet still signi cantly outperformed all available baselines by at least 14%. e information from entities is e ective and di erent with those from words: AttR-Duet in uences more than three-quarters of the queries, and improves the majority of them
Results
  • Experimental results on the TREC Web

    Track ad-hoc task demonstrate the e ectiveness of proposed methods.
  • On ClueWeb where the query entities are cleaner, all the entity related matching components from the duet provide similar or be er improvements compared with word-based features.
  • The recent approaches in neural ranking with word embeddings can be incorporated [9]; be er knowledge graph embeddings can be used [13]; be er entity search methods can be applied when extracting word to entity features [3]; the a ention mechanism can be extended to document’s entity-based representations.
  • More sophisticated neural ranking models [22] can be applied with the word-entity duet, especially when more training data are available
Conclusion
  • CONCLUSIONS AND FUTURE

    WORK is work presents a word-entity duet framework for utilizing knowledge bases in document ranking.
  • To handle the uncertainty introduced from the automatic-thusnoisy entity representations, a new ranking model AttR-Duet is developed.
  • It employs a simple a ention mechanism to demote the ambiguous or o -topic query entities, and learns simultaneously how to weight entities of varying quality and how to rank documents with the word-entity duet
Tables
  • Table1: Ranking features from query words to document words (title and body) (ΦQw-Dw)
  • Table2: Ranking features from query entities (name and description) to document words (title and body) (ΦQe-Dw)
  • Table3: Ranking features from query words to document entities (name and description) (ΦQw-De)
  • Table4: Ranking features from query entities to document’s title and body entities (ΦQe-De)
  • Table5: Attention features for query entities
  • Table6: Overall accuracies of AttR-Duet and baselines. (U) and (S) indicate unsupervised or supervised method. (E) indicates that information from entities is used. Relative performances compared with RankSVM are shown in percentages. Win/Tie/Loss are the number of queries a method improves, does not change, or hurts, compared with RankSVM on NDCG@20. Best results in each metric are marked bold. § marks statistically signi cant improvements (p< 0.05) over all baselines
  • Table7: Ranking accuracy with each group of matching feature from the word-entity duet. Base Retrieval is SDM on ClueWeb09 and Lm on ClueWeb12. LeToR-Qw-Dw uses the query and document’s BOW (Table 1). LeToR-Qe-Dw uses the query’s BOE and document’s BOW (Table 2), LeToR-Qw-De is the query BOW + document BOE (Table 3), and LeToR-Qe-De uses the query and document’s BOE (Table 4). LeToR-All uses all groups. Relative performances in percentages, Win/Tie/Loss on NDCG@20, and statistically signi cant improvements (†) are all compared with Base Retrieval
  • Table8: Examples of entities used in Qw-De and Qe-De. e rst half are examples of matched entities in relevant and irrelevant documents, which are used to extract Qw-De features. e second half are examples of entities falls into the exact match bin and the closest so match bins, used to extract Qe-De features
  • Table9: Table 9
  • Table10: Examples of learned attention. e entities in bold blue draw more attention; those in gray draw less attention
Download tables as Excel
Funding
  • is research was supported by National Science Foundation (NSF) grant IIS-1422676, a Google Faculty Research Award, and a fellowship from the Allen Institute for Arti cial Intelligence
Study subjects and analysis
documents: 100
For simplicity, the base retrieval on ClueWeb12B13 used is Indri’s default language model with KStemming, INQUERY stopword removal, and no spam ltering. All our methods and learning to rank baselines re-ranked the rst 100 documents from the base retrieval. e ClueWeb web pages were parsed using Boilerpipe1. e ‘KeepEverythingExtractor’ was used to keep as much text from the web page as possible, to minimize the parser’s in uence. e documents were parsed to two elds: title and body

Reference
  • Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 2008). ACM, 1247–1250.
    Google ScholarLocate open access versionFindings
  • Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems (NIPS 2013). 2787– 2795.
    Google ScholarLocate open access versionFindings
  • Jing Chen, Chenyan Xiong, and Jamie Callan. 2016. An empirical study of learning to rank for entity search. In Proceedings of the 39th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,(SIGIR 2016). ACM, 737–740.
    Google ScholarLocate open access versionFindings
  • Je rey Dalton, Laura Dietz, and James Allan. 201Entity query feature expansion using knowledge base links. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2014). ACM, 365–374.
    Google ScholarLocate open access versionFindings
  • Laura Dietz, Alexander Kotov, and Edgar Meij. 2017. Utilizing knowledge graphs in text-centric information retrieval. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM 2017). ACM, 815–816.
    Google ScholarLocate open access versionFindings
  • Faezeh Ensan and Ebrahim Bagheri. 2017. Document retrieval model through semantic linking. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM 2017). ACM, 181–190.
    Google ScholarLocate open access versionFindings
  • Paolo Ferragina and Ugo Scaiella. 2010. Fast and accurate annotation of short texts with Wikipedia pages. arXiv preprint arXiv:1006.3498 (2010).
    Findings
  • Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. FACC1: Freebase annotation of ClueWeb corpora, Version 1 (Release date 201306-26, Format version 1, Correction level 0). (June 2013).
    Google ScholarFindings
  • Jiafeng Guo, Yixing Fan, Qingyao Ai, and W.Bruce Cro. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016). ACM, 55–64. [10] orsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002). ACM, 133–142.
    Google ScholarLocate open access versionFindings
  • [11] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Soren Auer, and Christian Bizer. 2014. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web Journal (2014).
    Google ScholarLocate open access versionFindings
  • [12] Hang Li and Jun Xu. 2014. Semantic matching in search. Foundations and Trends in Information Retrieval 8 (2014), 89.
    Google ScholarLocate open access versionFindings
  • [13] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Arti cial Intelligence (AAAI 2015). 2181– 2187.
    Google ScholarLocate open access versionFindings
  • [14] Xitong Liu and Hui Fang. 2015. Latent entity space: A novel retrieval approach for entity-bearing queries. Information Retrieval Journal 18, 6 (2015), 473–503.
    Google ScholarLocate open access versionFindings
  • [15] Donald Metzler and W Bruce Cro. 2007. Linear feature-based models for information retrieval. Information Retrieval 10, 3 (2007), 257–274.
    Google ScholarLocate open access versionFindings
  • [16] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 2 h Advances in Neural Information Processing Systems 2013 (NIPS 2013). 3111–3119.
    Google ScholarLocate open access versionFindings
  • [17] Hadas Raviv, Oren Kurland, and David Carmel. 20Document retrieval using entity-based language models. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016). ACM, 65–74.
    Google ScholarLocate open access versionFindings
  • [18] Ilya Sutskever, James Martens, George E Dahl, and Geo rey E Hinton. 2013. On the importance of initialization and momentum in deep learning.. In Proceedings of the 29th International Conference on Machine Learning (ICML 2013). 1139–1147.
    Google ScholarLocate open access versionFindings
  • [19] Chenyan Xiong and Jamie Callan. 2015. EsdRank: Connecting query and documents through external semi-structured data. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015). ACM, 951–960.
    Google ScholarLocate open access versionFindings
  • [20] Chenyan Xiong and Jamie Callan. 2015. ery expansion with Freebase. In Proceedings of the h ACM International Conference on the eory of Information Retrieval (ICTIR 2015). ACM, 111–120.
    Google ScholarLocate open access versionFindings
  • [21] Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2016. Bag-of-Entities representation for ranking. In Proceedings of the sixth ACM International Conference on the eory of Information Retrieval (ICTIR 2016). ACM, 181–184.
    Google ScholarLocate open access versionFindings
  • [22] Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power.
    Google ScholarFindings
  • 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2017). ACM, To Appear.
    Google ScholarLocate open access versionFindings
  • Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 25th International Conference on World Wide Web (WWW 2017). ACM, 1271–1279.
    Google ScholarLocate open access versionFindings
  • Yang Xu, Gareth JF Jones, and Bin Wang. 2009. ery dependent pseudorelevance feedback based on Wikipedia. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009). ACM, 59–66.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科