Zero-Shot Visual Imitation

Parsa Mahmoudieh
Parsa Mahmoudieh
Guanghao Luo
Guanghao Luo
Pulkit Agrawal
Pulkit Agrawal
Dian Chen
Dian Chen
Yide Shentu
Yide Shentu

CVPR Workshops, 2018.

Cited by: 107|Bibtex|Views195
EI
Other Links: dblp.uni-trier.de|arxiv.org
Weibo:
To account for varying number of steps required to reach different goals, we propose to jointly optimize the goal-conditioned skill policy with a goal recognizer that determines if the current goal has been satisfied

Abstract:

The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both 'what' and 'how' to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forwa...More

Code:

Data:

Introduction
  • Imitating expert demonstration is a powerful mechanism for learning to perform tasks from raw sensory observations.
  • The authors follow [1, 13, 18] in pursuing an alternative paradigm, where an agent explores the environment without any expert supervision and distills this exploration data into goal-directed skills.
  • These skills can be used to imitate the visual demonstration provided by the expert [15].
  • By skill the authors mean a function that predicts the sequence of actions to take the agent from the current observation to the goal.
Highlights
  • Imitating expert demonstration is a powerful mechanism for learning to perform tasks from raw sensory observations
  • To account for varying number of steps required to reach different goals, we propose to jointly optimize the goal-conditioned skill policy with a goal recognizer that determines if the current goal has been satisfied
  • Forward Consistency Loss Instead of penalizing the actions predicted by the goal-conditioned skill policy to match the ground truth, we propose to learn the parameters of goal-conditioned skill policy by minimizing the distance between observation xt+1 resulting by executing the predicted action at = π and the observation xt+1, which is the result of executing the ground truth action at being used to train the goal-conditioned skill policy
  • We find that our goal-conditioned skill policy model outperforms the baseline models in reaching the target location
Results
  • The authors call this function a goal-conditioned skill policy (GSP).
  • The GSP is learned in a self-supervised
  • This is a workshop version of the ICLR 2018 paper, available here: https://pathak22.github.io/ zeroshot-imitation/
  • Way by re-labeling the states visited during the agent’s exploration of the environment as goals and the actions executed by the agent as the prediction targets, similar to [1, 2].
  • One critical challenge in learning the GSP is that, in general, there are multiple possible ways of going from one state to another: that is, the distribution of trajectories between states is multi-modal.
  • The authors address this issue with the novel forward consistency loss based on the intuition that, for most tasks, reaching the goal is more important than how it is reached; details follow in method section.
  • To account for varying number of steps required to reach different goals, the authors propose to jointly optimize the GSP with a goal recognizer that determines if the current goal has been satisfied.
  • See Figure 1 for a schematic illustration.
  • The authors call the method zero-shot because the agent never has access to expert actions, neither during training of the GSP nor for task demonstration at inference.
  • Most recent work on one-shot imitation learning requires full knowledge of actions and a wealth of expert demonstrations during training [6, 7].
  • The authors propose xt xt+1 at−1 (a) Inverse Model xt xg at−1 (b) Multi-step GSP
Conclusion
  • At xt xg at−1 (c) Forward-regularized GSP
  • Xt xg at−1 (d) Forward-consistent GSP
  • L ˆat feed-forward forward regularizer
  • AAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdjAAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdjAAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdjAAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdj Summary
  • Imitating expert demonstration is a powerful mechanism for learning to perform tasks from raw sensory observations.
  • The authors follow [1, 13, 18] in pursuing an alternative paradigm, where an agent explores the environment without any expert supervision and distills this exploration data into goal-directed skills.
  • These skills can be used to imitate the visual demonstration provided by the expert [15].
  • By skill the authors mean a function that predicts the sequence of actions to take the agent from the current observation to the goal.
  • The authors call this function a goal-conditioned skill policy (GSP).
  • The GSP is learned in a self-supervised
  • This is a workshop version of the ICLR 2018 paper, available here: https://pathak22.github.io/ zeroshot-imitation/
  • Way by re-labeling the states visited during the agent’s exploration of the environment as goals and the actions executed by the agent as the prediction targets, similar to [1, 2].
  • One critical challenge in learning the GSP is that, in general, there are multiple possible ways of going from one state to another: that is, the distribution of trajectories between states is multi-modal.
  • The authors address this issue with the novel forward consistency loss based on the intuition that, for most tasks, reaching the goal is more important than how it is reached; details follow in method section.
  • To account for varying number of steps required to reach different goals, the authors propose to jointly optimize the GSP with a goal recognizer that determines if the current goal has been satisfied.
  • See Figure 1 for a schematic illustration.
  • The authors call the method zero-shot because the agent never has access to expert actions, neither during training of the GSP nor for task demonstration at inference.
  • Most recent work on one-shot imitation learning requires full knowledge of actions and a wealth of expert demonstrations during training [6, 7].
  • The authors propose xt xt+1 at−1 (a) Inverse Model xt xg at−1 (b) Multi-step GSP
  • At xt xg at−1 (c) Forward-regularized GSP
  • Xt xg at−1 (d) Forward-consistent GSP
  • L ˆat feed-forward forward regularizer
  • AAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdjAAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdjAAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdjAAACAnicbVBNS8NAEN3Ur1q/qt70EixCBSlJEdRbwYsHDxWMLTQhTLbbdunmg92JUELBi3/FiwcVr/4Kb/4bt20O2vpg4PHeDDPzgkRwhZb1bRSWlldW14rrpY3Nre2d8u7evYpTSZlDYxHLdgCKCR4xBzkK1k4kgzAQrBUMryZ+64FJxePoDkcJ80LoR7zHKaCW/PKBGwIOKIjsZlwFH0/dAWAGYx9P/HLFqllTmIvEzkmF5Gj65S+3G9M0ZBFSAUp1bCtBLwOJnAo2LrmpYgnQIfRZR9MIQqa8bPrD2DzWStfsxVJXhOZU/T2RQajUKAx05+RiNe9NxP+8Toq9Cy/jUZIii+hsUS8VJsbmJBCzyyWjKEaaAJVc32rSAUigqGMr6RDs+ZcXiVOvXdbs27NKo56nUSSH5IhUiU3OSYNckyZxCCWP5Jm8kjfjyXgx3o2PWWvByGf2yR8Ynz80ppdj Tables
  • Table1: Quantitative evaluation of various methods on the task of navigating using a single image of goal in an unseen environment. Our full GSP model outperforms the baselines significantly
Related work
  • Nair et al [15] observe a sequence of images from the expert demonstration for performing rope manipulations. Sermanet et al [22] imitate humans with robots by self-supervised learning but require expert supervision at training time. Third person imitation learning [23] and the concurrent work of imitation-from-observation [14] learn to translate expert observations into agent observations such that they can do policy optimization to minimize the distance between the agent trajectory and the translated demonstration, but they require demonstrations for learning.Visual servoing is a standard problem in robotics [5, 10,11,12, 24, 26] that seeks to take actions that align the agent’s observation with carefully-designed visual features or raw pixel intensities. The works of Jordan et al [9]; Wolpert et al [25]; Agrawal et al [1]; Pathak et al [17] jointly learn forward and inverse dynamics model but do not optimize for consistency between the forward and inverse dynamics. We empirically show that learning models by our forward consistency loss significantly improves task performance.

    2https://pathak22.github.io/zeroshot-imitation/

    AAACB3icbVC7TgJBFJ3FF+Jr1dLCicQEC8kujdqR2JjYYBQhAUJmh7swYXZmM3PXhBBKG3/FxkKNrb9g59+4PAoFT3Vyzr25554glsKi5307maXlldW17HpuY3Nre8fd3bu3OjEcqlxLbeoBsyCFgioKlFCPDbAokFAL+pdjv/YAxgqt7nAQQytiXSVCwRmmUts9LAQn9DbhHKylhiHQUBt6rTSe4kCobtvNe0VvArpI/BnJkxkqbfer2dE8iUAhl8zahu/F2Boyg4JLGOWaiYWY8T7rQiOlikVgW8PJIyN6nCqdSYJQK6QT9ffGkEXWDqIgnYwY9uy8Nxb/8xoJhuetoVBxgqD49FCYSIqajluhHWGAoxykhHEj0qyU95hhHNPucmkJ/vzLi6RaKl4U/ZtSvuzN2siSA3JECsQnZ6RMrkiFVAknj+SZvJI358l5cd6dj+loxpnt7JM/cD5/AJGGmJQ=AAACB3icbVC7TgJBFJ3FF+Jr1dLCicQEC8kujdqR2JjYYBQhAUJmh7swYXZmM3PXhBBKG3/FxkKNrb9g59+4PAoFT3Vyzr25554glsKi5307maXlldW17HpuY3Nre8fd3bu3OjEcqlxLbeoBsyCFgioKlFCPDbAokFAL+pdjv/YAxgqt7nAQQytiXSVCwRmmUts9LAQn9DbhHKylhiHQUBt6rTSe4kCobtvNe0VvArpI/BnJkxkqbfer2dE8iUAhl8zahu/F2Boyg4JLGOWaiYWY8T7rQiOlikVgW8PJIyN6nCqdSYJQK6QT9ffGkEXWDqIgnYwY9uy8Nxb/8xoJhuetoVBxgqD49FCYSIqajluhHWGAoxykhHEj0qyU95hhHNPucmkJ/vzLi6RaKl4U/ZtSvuzN2siSA3JECsQnZ6RMrkiFVAknj+SZvJI358l5cd6dj+loxpnt7JM/cD5/AJGGmJQ=AAACB3icbVC7TgJBFJ3FF+Jr1dLCicQEC8kujdqR2JjYYBQhAUJmh7swYXZmM3PXhBBKG3/FxkKNrb9g59+4PAoFT3Vyzr25554glsKi5307maXlldW17HpuY3Nre8fd3bu3OjEcqlxLbeoBsyCFgioKlFCPDbAokFAL+pdjv/YAxgqt7nAQQytiXSVCwRmmUts9LAQn9DbhHKylhiHQUBt6rTSe4kCobtvNe0VvArpI/BnJkxkqbfer2dE8iUAhl8zahu/F2Boyg4JLGOWaiYWY8T7rQiOlikVgW8PJIyN6nCqdSYJQK6QT9ffGkEXWDqIgnYwY9uy8Nxb/8xoJhuetoVBxgqD49FCYSIqajluhHWGAoxykhHEj0qyU95hhHNPucmkJ/vzLi6RaKl4U/ZtSvuzN2siSA3JECsQnZ6RMrkiFVAknj+SZvJI358l5cd6dj+loxpnt7JM/cD5/AJGGmJQ=AAACB3icbVC7TgJBFJ3FF+Jr1dLCicQEC8kujdqR2JjYYBQhAUJmh7swYXZmM3PXhBBKG3/FxkKNrb9g59+4PAoFT3Vyzr25554glsKi5307maXlldW17HpuY3Nre8fd3bu3OjEcqlxLbeoBsyCFgioKlFCPDbAokFAL+pdjv/YAxgqt7nAQQytiXSVCwRmmUts9LAQn9DbhHKylhiHQUBt6rTSe4kCobtvNe0VvArpI/BnJkxkqbfer2dE8iUAhl8zahu/F2Boyg4JLGOWaiYWY8T7rQiOlikVgW8PJIyN6nCqdSYJQK6QT9ffGkEXWDqIgnYwY9uy8Nxb/8xoJhuetoVBxgqD49FCYSIqajluhHWGAoxykhHEj0qyU95hhHNPucmkJ/vzLi6RaKl4U/ZtSvuzN2siSA3JECsQnZ6RMrkiFVAknj+SZvJI358l5cd6dj+loxpnt7JM/cD5/AJGGmJQ= AAACFHicbVC7SgNBFJ31GeMramkzGEQFDbs2ahewsRFWkzVCEvTu5K4ZnJ1dZmaFEPITNv6KjYWKrYWdf+Mk2cLXgYHDOecy954wFVwb1/10JianpmdmC3PF+YXFpeXSyuqFTjLFMGCJSNRlCBoFlxgYbgRepgohDgU2wtvjod+4Q6V5Iuuml2I7hhvJI87AWOmqtLsNO7Tu1/bO/VOKSiWKRvZd17ao7kKKNAbJ00zk8bJbcUegf4mXkzLJ4V+VPlqdhGUxSsMEaN303NS0+6AMZwIHxVamMQV2CzfYtFRCjLrdH101oJtW6Yy2iRJp6Ej9PtGHWOteHNpkDKarf3tD8T+vmZnosN3nMs0MSjb+KMoENQkdVkQ7XCEzomcJMMXtrpR1QQEztsiiLcH7ffJfEuxXjire2X656uZtFMg62SDbxCMHpEpOiE8Cwsg9eSTP5MV5cJ6cV+dtHJ1w8pk18gPO+xcLB50VAAACFHicbVC7SgNBFJ31GeMramkzGEQFDbs2ahewsRFWkzVCEvTu5K4ZnJ1dZmaFEPITNv6KjYWKrYWdf+Mk2cLXgYHDOecy954wFVwb1/10JianpmdmC3PF+YXFpeXSyuqFTjLFMGCJSNRlCBoFlxgYbgRepgohDgU2wtvjod+4Q6V5Iuuml2I7hhvJI87AWOmqtLsNO7Tu1/bO/VOKSiWKRvZd17ao7kKKNAbJ00zk8bJbcUegf4mXkzLJ4V+VPlqdhGUxSsMEaN303NS0+6AMZwIHxVamMQV2CzfYtFRCjLrdH101oJtW6Yy2iRJp6Ej9PtGHWOteHNpkDKarf3tD8T+vmZnosN3nMs0MSjb+KMoENQkdVkQ7XCEzomcJMMXtrpR1QQEztsiiLcH7ffJfEuxXjire2X656uZtFMg62SDbxCMHpEpOiE8Cwsg9eSTP5MV5cJ6cV+dtHJ1w8pk18gPO+xcLB50VAAACFHicbVC7SgNBFJ31GeMramkzGEQFDbs2ahewsRFWkzVCEvTu5K4ZnJ1dZmaFEPITNv6KjYWKrYWdf+Mk2cLXgYHDOecy954wFVwb1/10JianpmdmC3PF+YXFpeXSyuqFTjLFMGCJSNRlCBoFlxgYbgRepgohDgU2wtvjod+4Q6V5Iuuml2I7hhvJI87AWOmqtLsNO7Tu1/bO/VOKSiWKRvZd17ao7kKKNAbJ00zk8bJbcUegf4mXkzLJ4V+VPlqdhGUxSsMEaN303NS0+6AMZwIHxVamMQV2CzfYtFRCjLrdH101oJtW6Yy2iRJp6Ej9PtGHWOteHNpkDKarf3tD8T+vmZnosN3nMs0MSjb+KMoENQkdVkQ7XCEzomcJMMXtrpR1QQEztsiiLcH7ffJfEuxXjire2X656uZtFMg62SDbxCMHpEpOiE8Cwsg9eSTP5MV5cJ6cV+dtHJ1w8pk18gPO+xcLB50VAAACFHicbVC7SgNBFJ31GeMramkzGEQFDbs2ahewsRFWkzVCEvTu5K4ZnJ1dZmaFEPITNv6KjYWKrYWdf+Mk2cLXgYHDOecy954wFVwb1/10JianpmdmC3PF+YXFpeXSyuqFTjLFMGCJSNRlCBoFlxgYbgRepgohDgU2wtvjod+4Q6V5Iuuml2I7hhvJI87AWOmqtLsNO7Tu1/bO/VOKSiWKRvZd17ao7kKKNAbJ00zk8bJbcUegf4mXkzLJ4V+VPlqdhGUxSsMEaN303NS0+6AMZwIHxVamMQV2CzfYtFRCjLrdH101oJtW6Yy2iRJp6Ej9PtGHWOteHNpkDKarf3tD8T+vmZnosN3nMs0MSjb+KMoENQkdVkQ7XCEzomcJMMXtrpR1QQEztsiiLcH7ffJfEuxXjire2X656uZtFMg62SDbxCMHpEpOiE8Cwsg9eSTP5MV5cJ6cV+dtHJ1w8pk18gPO+xcLB50V

    (a) TPS-RPM error for ‘S’ shape manipulation (b) Success rate for Knot-tying
Reference
  • P. Agrawal, A. Nair, P. Abbeel, J. Malik, and S. Levine. Learning to poke by poking: Experiential learning of intuitive physics. NIPS, 2016. 1, 3
    Google ScholarLocate open access versionFindings
  • M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba. Hindsight experience replay. In NIPS, 2017. 1
    Google ScholarLocate open access versionFindings
  • B. D. Argall, S. Chernova, M. Veloso, and B. Browning. A survey of robot learning from demonstration. Robotics and autonomous systems, 2009. 1
    Google ScholarFindings
  • A. Bandura and R. H. Walters. Social learning theory, volume 1. Prentice-hall Englewood Cliffs, NJ, 1977. 1
    Google ScholarFindings
  • G. Caron, E. Marchand, and E. M. Mouaddib. Photometric visual servoing for omnidirectional cameras. Autonomous Robots, 35(2-3):177–193, 2013. 3
    Google ScholarLocate open access versionFindings
  • Y. Duan, M. Andrychowicz, B. Stadie, O. J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba. One-shot imitation learning. In NIPS, 2017. 1
    Google ScholarLocate open access versionFindings
  • C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine. One-shot visual imitation learning via meta-learning. CoRL, 2011
    Google ScholarLocate open access versionFindings
  • D. Foster and P. Dayan. Structure in the space of value functions. Machine Learning, 2002. 2
    Google ScholarLocate open access versionFindings
  • M. I. Jordan and D. E. Rumelhart. Forward models: Supervised learning with a distal teacher. Cognitive science, 1992. 2, 3
    Google ScholarLocate open access versionFindings
  • H. Koichi and H. Tom. Visual servoing: real-time control of robot manipulators based on visual sensory feedback, volume 7. World scientific, 1993. 3
    Google ScholarLocate open access versionFindings
  • T. Lampe and M. Riedmiller. Acquiring visual servoing reaching and grasping skills using neural reinforcement learning. In Neural Networks (IJCNN), The 2013 International Joint Conference on, pages 1–8. IEEE, 2013. 3
    Google ScholarLocate open access versionFindings
  • A. X. Lee, S. Levine, and P. Abbeel. Learning visual servoing with deep features and fitted q-iteration. arXiv preprint arXiv:1703.11000, 2017. 3
    Findings
  • S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen. Learning hand-eye coordination for robotic grasping with large-scale data collection. In ISER, 2016. 1
    Google ScholarFindings
  • Y. Liu, A. Gupta, P. Abbeel, and S. Levine. Imitation from observation: Learning to imitate behaviors from raw video via context translation. ICRA, 2018. 3
    Google ScholarLocate open access versionFindings
  • A. Nair, D. Chen, P. Agrawal, P. Isola, P. Abbeel, J. Malik, and S. Levine. Combining self-supervised learning and imitation for vision-based rope manipulation. ICRA, 2017. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • A. Y. Ng and S. J. Russell. Algorithms for inverse reinforcement learning. In ICML, pages 663–670, 2000. 1
    Google ScholarLocate open access versionFindings
  • D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiositydriven exploration by self-supervised prediction. In ICML, 203
    Google ScholarLocate open access versionFindings
  • L. Pinto and A. Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. ICRA, 2016. 1
    Google ScholarLocate open access versionFindings
  • D. A. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. In NIPS, 1989. 1
    Google ScholarLocate open access versionFindings
  • S. Schaal. Is imitation learning the route to humanoid robots? Trends in cognitive sciences, 1999. 1
    Google ScholarLocate open access versionFindings
  • T. Schaul, D. Horgan, K. Gregor, and D. Silver. Universal value function approximators. In ICML, 2015. 2
    Google ScholarLocate open access versionFindings
  • P. Sermanet, C. Lynch, Y. Chebotar, J. Hsu, E. Jang, S. Schaal, and S. Levine. Time-contrastive networks: Selfsupervised learning from video. In ICRA, 2018. 3
    Google ScholarLocate open access versionFindings
  • B. C. Stadie, P. Abbeel, and I. Sutskever. Third-person imitation learning. In ICLR, 2017. 3
    Google ScholarLocate open access versionFindings
  • W. J. Wilson, C. W. Hulls, and G. S. Bell. Relative endeffector control using cartesian position based visual servoing. IEEE Transactions on Robotics and Automation, 12(5):684–696, 1996. 3
    Google ScholarLocate open access versionFindings
  • D. M. Wolpert, Z. Ghahramani, and M. I. Jordan. An internal model for sensorimotor integration. Science-AAAS-Weekly Paper Edition, 1995. 3
    Google ScholarFindings
  • B. H. Yoshimi and P. K. Allen. Active, uncalibrated visual servoing. In Robotics and Automation, 1994. Proceedings., 1994 IEEE International Conference on, pages 156– 161. IEEE, 1994. 3
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments