Reinforcement learning in NLP近几年,深度学习和强化学习中一些先进方法的出现使得两者的结合成为可能,其产物就是深度强化学习。深度强化学习既有继承于深度学习的强泛化和自特征提取能力,又可以像强化学习方法一样,使智能系统通过自我的试错在给定环境中学习解决特定任务的策略。而最近在自然语言处理方面,主要方向是利用RL辅助学习语义向量,再使用下游任务进行评估和用RL对模型进行微调。
AAAI, pp.12386-12393, (2020)
The tree-structured policy is invoked at each time step to reason a series of more robust primitive actions, which can sequentially regulate the temporal boundary via an iterative refinement process
Cited by11BibtexViews55
0
0
Prithviraj Ammanabrolu,Matthew Hausknecht
ICLR, (2020)
The knowledge graph serves as a means for the agent to understand its surroundings, accumulate information about the game, and disambiguate similar textual observations while the template-based action space lends a measure of structure that enables us to exploit that same knowled...
Cited by3BibtexViews42
0
0
IJCAI, (2019): 6309-6317
While there is a growing body of papers that incorporate language into Reinforcement Learning, most of the research effort has been focused on simple RL tasks and synthetic languages, with highly structured and instructive text
Cited by53BibtexViews208DOI
0
0
ICLR, (2019)
We introduced language-conditioned reward learning, an algorithm for scalable training of language-conditioned reward functions represented by neural networks
Cited by44BibtexViews112
0
0
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), (2019): 9414-9426
We demonstrate that language abstractions can serve as an efficient, flexible, and human-interpretable representation for solving a variety of long-horizon control problems in hierarchical reinforcement learning framework
Cited by33BibtexViews167
0
0
IJCAI, (2019): 2385-2391
The number of successful episodes is more than twice with Ext+Lang compared to ExtOnly. These results suggests that using natural language for reward shaping often helps learn a better final policy, and rarely results in a worse policy
Cited by32BibtexViews85DOI
0
0
CVPR, pp.334-343, (2019)
This paper focuses on a rarely investigated problem of localizing an activity via a sentence query which would be more challenging and practical
Cited by16BibtexViews31
0
0
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF AR..., (2019): 8393-8400
Experiments demonstrate that our method achieves a new state-of-the-art performance on two well-known datasets
Cited by15BibtexViews80
0
0
Kishor Jothimurugan,Rajeev Alur, Osbert Bastani
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), (2019): 13021-13030
We have proposed a language for formally specifying control tasks and an algorithm to learn policies to perform tasks specified in the language
Cited by1BibtexViews21
0
0
arXiv: Computation and Language, (2018)
To train Seq2SQL, we applied in-the-loop query execution to learn a policy for generating the conditions of the SQL query, which is unordered and unsuitable for optimization via cross entropy loss
Cited by356BibtexViews67
0
0
Xiaojun Xu,Chang Liu,Dawn Song
arXiv: Computation and Language, (2018)
We further introduce the column attention mechanism, which can further boost a sequence-to-set model’s performance
Cited by180BibtexViews123
0
0
ECCV, (2018): 38-55
We plan to explore the potential of the model-based reinforcement learning to transfer across different tasks, i.e. Vision-and-Language Navigation, Embodied Question Answering etc
Cited by54BibtexViews27DOI
0
0
ACL, pp.989-999, (2018)
The experimental evaluation shows that our model achieves the state-of-the-art results on Stanford Natural language Inference and MultiNLI datasets
Cited by26BibtexViews74
0
0
Barret Zoph,Quoc V. Le
international conference on learning representations, (2017)
Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model ...
Cited by2591BibtexViews302
0
0
Yuxi Li
arXiv: Learning, (2017)
We present a list of topics not reviewed yet in Section 6, give a brief summary in Section 8, and close with discussions in Section 9
Cited by420BibtexViews1464
0
0
ACL, (2017): 1051-1062
While the two approaches have enjoyed success on many tasks, we found them to work poorly out of the box for our task
Cited by126BibtexViews79DOI
0
0
international conference on learning representations, (2017)
We demonstrated the benefit of learning task-specific composition order on four tasks: sentiment analysis, semantic relatedness, natural language inference, and sentence generation
Cited by121BibtexViews113
0
0
empirical methods in natural language processing, (2017): 606-616
We design an active learning algorithm as a policy based on deep reinforcement learning
Cited by83BibtexViews46DOI
0
0
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), (2016): 2137-2145
We analysed the communication protocol discovered by differentiable inter-agent learning for n = 3 by sampling 1K episodes, for which Figure 4(c) shows a decision tree corresponding to an optimal strategy
Cited by518BibtexViews130
0
0
ACL, (2016): 1621-1630
In this paper we develop a deep reinforcement relevance network, a novel DNN architecture for handling actions described by natural language in decision-making tasks such as text games
Cited by143BibtexViews101DOI
0
0