Efficient Web-Based Linkage of Short to Long Forms
WebDB(2008)
摘要
Abbreviations, acronyms, initialisms, and shortenings fre- quently occur in many texts found on the Web, such as publication metadata, stock ticker codes, and biological ar- ticles. To connect these disparate forms together for knowl- edge discovery, short forms must be properly linked to their canonical long forms. In this paper, we demonstrate how a search engine can be efficiently utilized in mining the re- quired contextual information, so that short forms can be ef- fectively linked to long forms. We show that a count-based method consistently outperforms other methods, and that using the snippets is better than using the full web pages. We also consider adaptively combining a query probing algo- rithm together with our count-based method. This reduces running time and network bandwidth, while maintaining the strong linkage performance.
更多查看译文
关键词
web as information resource,record linkage,query probing,abbreviation matching,search engine,web pages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络