谷歌浏览器插件
订阅小程序
在清言上使用

Efficient N-Gram, Skipgram and Flexgram Modelling with Colibri Core

Journal of open research software(2016)

引用 12|浏览0
暂无评分
摘要
Counting n-grams lies at the core of any frequentist corpus analysis and is often considered a trivial matter. Going beyond consecutive n-grams to patterns such as skipgrams and flexgrams increases the demand for efficient solutions. The need to operate on big corpus data does so even more. Lossless compression and non-trivial algorithms are needed to lower the memory demands, yet retain good speed. Colibri Core is software for the efficient computation and querying of n-grams, skipgrams and flexgrams from corpus data. The resulting pattern models can be analysed and compared in various ways. The software offers a programming library for C++ and Python, as well as command-line tools.
更多
查看译文
关键词
Natural Language Processing, Computational Linguistics, n-grams, skipgrams, corpus frequency, corpus analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要