Reinforcement learning in CV近几年,深度学习和强化学习中一些先进方法的出现使得两者的结合成为可能,其产物就是深度强化学习。深度强化学习既有继承于深度学习的强泛化和自特征提取能力,又可以像强化学习方法一样,使智能系统通过自我的试错在给定环境中学习解决特定任务的策略。而在计算机视觉这一领域,强化学习也得到了很多应用,特别是在机器人领域。
Xin Chen, Guannan Qu, Yujie Tang, Steven Low, Na Li
A number of works have been devoted to applying reinforcement learning to the power system field, many key problems remain unsolved and there is still a substantial distance from practical implementation
Cited by0BibtexViews20
0
0
CVPR, pp.11154-11163, (2020)
We have presented the Reinforcement learning-CycleGAN to address the visual simulation-to-real gap, and showed it significantly improves real world vision-based robotics with two varied grasping setups
Cited by12BibtexViews147DOI
0
0
ICLR, (2019)
We introduced language-conditioned reward learning, an algorithm for scalable training of language-conditioned reward functions represented by neural networks
Cited by44BibtexViews112
0
0
Katie Kang, Suneel Belkhale,Gregory Kahn,Pieter Abbeel,Sergey Levine
ICRA, (2019): 6008-6014
Our experiments evaluate the design decisions of our method and show that our approach enables a nano aerial vehicle to fly through novel, complex hallway environments
Cited by43BibtexViews119DOI
0
0
Jonás Kulhánek, Erik Derner, Tim de Bruin,Robert Babuska
ECMR, pp.1-8, (2019)
It is based on a compact deep neural network capable of fast learning over multiple realistic environments, using the batched A2C algorithm extended with novel auxiliary tasks
Cited by4BibtexViews28DOI
0
0
Dmitry Kalashnikov,Alex Irpan,Peter Pastor, Julian Ibarz, Alexander Herzog,Eric Jang, Deirdre Quillen, Ethan Holly,Mrinal Kalakrishnan,Vincent Vanhoucke,Sergey Levine
CoRL, pp.651-673, (2018)
We study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach
Cited by106BibtexViews197
0
0
Xiaodan Liang, Tairui Wang, Luona Yang,Eric Xing
ECCV, (2018)
Our Controllable Imitative Reinforcement Learning incorporates controllable imitation learning with Deep Deterministic Policy Gradient policy learning to resolve the sample inefficiency issue that is well known in reinforcement learning research
Cited by78BibtexViews37
0
0
ECCV, (2018): 38-55
We plan to explore the potential of the model-based reinforcement learning to transfer across different tasks, i.e. Vision-and-Language Navigation, Embodied Question Answering etc
Cited by54BibtexViews27DOI
0
0
Ricson Cheng, Arpit Agarwal,Katerina Fragkiadaki
CoRL, pp.422-431, (2018)
We present modular actor-critic network architectures for action and perception in which only part of the state is exposed to the gripper controller, and where object detector modules are used to localize the object in the selected camera viewpoints
Cited by9BibtexViews26
0
0
Horia Porav,Paul Newman
ITSC, pp.958-964, (2018)
We observe a significant increase in the number of collisions avoided as compared to baseline, especially for Time to Collision
Cited by8BibtexViews15DOI
0
0
Yuxi Li
arXiv: Learning, (2017)
We present a list of topics not reviewed yet in Section 6, give a brief summary in Section 8, and close with discussions in Section 9
Cited by420BibtexViews1464
0
0
CoRR, (2015)
In the real world experiment using synthetic images as inputs, the agent got a consistent success rate with that in simulation. These two different results show that the failure in the real world experiment with camera images was caused by the input image differences between real...
Cited by62BibtexViews43
0
0
ICML, pp.593-600, (2005)
A vision system trained on computer graphics was able to give reasonable depth estimate on real image data, and a control policy trained in a graphical simulator worked well on real autonomous driving
Cited by484BibtexViews52DOI
0
0
Robotics and Automation, 1995. Proceedings., 1995 IEEE International Conference, (1995): 146-153vol.1
We adopted the Learning from Easy Missions algorithm similar to a “shaping” technique in animal learning in order to speed up the learning time instead of task decomposition
Cited by106BibtexViews18DOI
0
0
IROS, (1994): 917-924
It is time-consuming, the learning method to obtain a new policy was the best one because the simple sum and the switchg methods do not learn anymore to cope with new situations
Cited by93BibtexViews18DOI
0
0