Extension of the Attention Mechanism in Neural Machine Translation

Christopher Jan-Steffen Brix, Ing H Ney,B Leibe, M Sc P Bahar


引用 0|浏览5
Recently, machine translation (MT) has been significantly improved by the usage of neural networks (NNs). Neural machine translation (NMT) allows to read a source sentence in a given language and to output the translation in another. The most promising results are reported based on an encoder-decoder architecture with an additional attention mechanism. There, the encoder reads the source sentence and generates a set of source representations. The decoder outputs a sequence of variable length given the target history and a context vector. This context vector is determined by the attention layer and is dependent on the source representations and the target history. To this end, the attention layer selects the currently important positions in the input, therefore creating an alignment between source and target. A lot of research is put into different definitions of the attention layer. In this bachelor thesis, we evaluate the impact of making the encoder depend on the decoder and therefore recomputing the encoding at every time step. Furthermore, we try to provide the attention layer with the knowledge of which source words it attended to at the current and at previous time steps. We use recurrent neural networks (RNNs) that can process sequences of arbitrary length to process the source representations and generate the context vector for the decoder. Because basic RNNs suffer from vanishing and exploding gradients, we use long short-term memory (LSTM) cells and gated recurrent units (GRUs). To further improve the model, we add an additional attention layer on top of the RNN to compute the context vector as a weighted sum. Besides that …
AI 理解论文
Chat Paper