Reinforcement learning in NLP近几年,深度学习和强化学习中一些先进方法的出现使得两者的结合成为可能,其产物就是深度强化学习。深度强化学习既有继承于深度学习的强泛化和自特征提取能力,又可以像强化学习方法一样,使智能系统通过自我的试错在给定环境中学习解决特定任务的策略。而最近在自然语言处理方面,主要方向是利用RL辅助学习语义向量,再使用下游任务进行评估和用RL对模型进行微调。
The tree-structured policy is invoked at each time step to reason a series of more robust primitive actions, which can sequentially regulate the temporal boundary via an iterative refinement process
The knowledge graph serves as a means for the agent to understand its surroundings, accumulate information about the game, and disambiguate similar textual observations while the template-based action space lends a measure of structure that enables us to exploit that same knowled...
While there is a growing body of papers that incorporate language into Reinforcement Learning, most of the research effort has been focused on simple RL tasks and synthetic languages, with highly structured and instructive text
We introduced language-conditioned reward learning, an algorithm for scalable training of language-conditioned reward functions represented by neural networks
We demonstrate that language abstractions can serve as an efficient, flexible, and human-interpretable representation for solving a variety of long-horizon control problems in hierarchical reinforcement learning framework
The number of successful episodes is more than twice with Ext+Lang compared to ExtOnly. These results suggests that using natural language for reward shaping often helps learn a better final policy, and rarely results in a worse policy
To train Seq2SQL, we applied in-the-loop query execution to learn a policy for generating the conditions of the SQL query, which is unordered and unsuitable for optimization via cross entropy loss
We plan to explore the potential of the model-based reinforcement learning to transfer across different tasks, i.e. Vision-and-Language Navigation, Embodied Question Answering etc
international conference on learning representations, (2017)
Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model ...
We demonstrated the benefit of learning task-specific composition order on four tasks: sentiment analysis, semantic relatedness, natural language inference, and sentence generation
We analysed the communication protocol discovered by differentiable inter-agent learning for n = 3 by sampling 1K episodes, for which Figure 4(c) shows a decision tree corresponding to an optimal strategy
In this paper we develop a deep reinforcement relevance network, a novel DNN architecture for handling actions described by natural language in decision-making tasks such as text games