Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
EMNLP, pp. 1464-1474, 2017.
We use an encoder-decoder neural machine translation architecture with global attention, where both the encoder and decoder are recurrent neural networks. These models are normally trained by supervised learning, but as reference translations are not available in our setting, we ...
Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning alg...More
PPT (Upload PPT)