Limitations of Autoregressive Models and Their Alternatives

NAACL-HLT(2021)

引用 30|浏览49
暂无评分
摘要
Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problemsin main text, note that they’re easy because you get to see the whole string rather than a prefix—this is really the difference between checking a given assignment against a formula and asking whether any satisfying assignment exists for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.
更多
查看译文
关键词
autoregressive models,alternatives,limitations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要