Dataset 3: Forum only language models
user-5ebe28ba4c775eda72abcdf3(2019)
摘要
These word language models were trained on only the forum data (141M words). For these models there is a choice of 5K, 20K, or 64K vocabulary sizes. These are available as 1-gram, 2-gram, 3-gram, or 4-gram models. Different entropy pruning thresholds were used to create a small and large version of each word language model.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络