Retrieval using document structure and annotations

Retrieval using document structure and annotations(2010)

引用 26|浏览19
暂无评分
摘要
Successful retrieval of information from text collections requires effective use of the information present in a collection. The structure of documents in the collection and the relationships between elements within a document and other documents contain important information about the meaning of these elements. For example, the words present in the title of a web page may contain important clues about that page's content. The text of a link to the web page may also be an important indicator of the page's content. Researchers have long recognized that structure can be an important indicator of relevance. Yet the majority of prior work is limited to experiments on small test collections and evaluated on a single retrieval task. These limitations hamper the generality of the conclusions. The recent construction of large and diverse test collections provides us the opportunity to reconsider the general task of retrieval in collections with structure. This dissertation draws on three retrieval tasks to identify important properties of retrieval systems supporting the use of structure and annotations. We investigate known-item finding of web pages, retrieving elements from XML articles, and the retrieval of answer-bearing sentences as a component of a question-answering system. The retrieval model, an adaptation of the Inference Network model, clarifies the query language and simplifies the process of smoothing using multiple representations. The experiments in this dissertation show state-of-the-art results for these tasks and also provide novel insights to the shape of the parameter space when using mixtures of language models. Our experiments with question-answering further show how semantic predicates automatically annotated on a collection can be used to improve a system's ability to retrieve answer-bearing sentences.
更多
查看译文
关键词
retrieval task,successful retrieval,important indicator,web page,retrieval system,answer-bearing sentence,important clue,single retrieval task,important information,retrieval model,document structure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要