A Corpus Search System Utilizing Lexical Dependency Structure

LREC(2006)

引用 24|浏览4
暂无评分
摘要
This paper presents a corpus search system utilizing lexical dependency structure. The user's query consists of a sequence of keywords. For a given query, the system automatically generates the dependency structure patterns which consist of keywords in the query, and returns the sentences whose dependency structures match the generated patterns. The dependency structure patterns are generated by using two operations: combining and interpolation, which utilize dependency structures in the searched corpus. The operations enable the system to generate only the dependency structure patterns that occur in the corpus. The system achieves simple and intuitive corpus search and it is enough linguistically sophisticated to utilize structural information. Several corpus search systems have been presented. Most systems provide keyword-based search functionality. The search is simple and intuitive, but not enough linguistically sophisticated to utilize structural information. On the other hand, (Corley et al., 2001) and (Resnik and Elkiss, 2005) have presented corpus search systems utiliz- ing syntactic structure, Gsearch and Linguist's Search En- gine (LSE), respectively. These systems can search cor- pora by using phrase structure patterns. In the Gsearch, the user gives a phrase structure pattern and a grammar to the system. The system constructs parse trees of the sen- tences in the corpus by using the given grammar, and re- turns the sentences whose parse trees match the given pat- tern. In the LSE, the user first gives an example of sen- tences which he/she needs. The system parses the example by using a statistical parser and returns the parsing result. The user edits the resulting parse tree to specify a structural query. The system finally returns the sentences whose parse trees match the structural query. The Gsearch and LSE can search corpora by utilizing syntactic information. However, they do not achieve simple search like keyword-based sys- tems. This paper presents a corpus search system which auto- matically generates structural queries from keyword-based queries. The system searches corpora based on lexical de- pendency information. The user's query is a sequence of keywords. For a given query, it generates dependency struc- ture patterns by using two operations: combining and inter- polation. The user need neither to build a grammar like the Gsearch nor to edit structural query like the LSE, because of the automatic pattern generation. The system achieves simple and intuitive corpus search and it is enough to lin- guistically sophisticated to utilize structural information. 2. Corpus Search based on Dependency Structure
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要