High-performance Computational Framework for Phrase Relatedness.
DocEng(2017)
摘要
TrWP is a text relatedness measure that computes semantic similarity between words and phrases utilizing aggregated statistics from the Google Web 1T 5-gram corpus. The phrase similarity computation in TrWP is costly in terms of both time and space, making the existing implementation of TrWP impractical for real-world usage. In this work, we present an in-memory computational framework for TrWP, which optimizes the corpus search using perfect hashing and minimizes the required memory cost using variable length encoding. Evaluated using the Google Web 1T 5-gram corpus, we demonstrate that the computational speed of our framework outperforms a file-based implementation by several orders of magnitude.
更多查看译文
关键词
Semantic Text Similarity, Efficient Indexing, Searching, Compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络