Dataset 2: Recommended Language Models

user-5ebe28ba4c775eda72abcdf3（2019）

引用 0|浏览57

暂无评分

摘要

These word models were trained with a sentence start word of< s>, a sentence end word of, and an unknown word< unk>. The word vocabulary was the most frequent 64K words in the forum dataset that were also in a list of 330K known English words. All words are in lowercase. The character models are 12-gram models and were trained using interpolated Witten-Bell smoothing. The character model vocabulary consists of the lowercase letters az, apostrophe,< sp>; for a space,< s> for sentence start, and for sentence end.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要