The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

Alice Martin,Charles Ollion,Florian Strub,Sylvain Le Corff,Olivier Pietquin

arxiv（2020）

引用 0|浏览48

暂无评分

摘要

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a recurrent architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random function of these past states in a given attention window. In this general state-space setting, we use Sequential Monte Carlo methods to approximate the posterior distributions of the states given the observations, and then to estimate the gradient of the log-likelihood. We thus propose a generative model providing a predictive distribution, instead of a single-point estimate.

查看译文

关键词

Monte Carlo method,Particle filter,Generative model,Random function,Deep learning,Algorithm,Computer science,Artificial intelligence,Sequence prediction,Sequential monte carlo methods,Time step

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要