Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

arXiv: Computer Vision and Pattern Recognition, Volume abs/1704.07911, 2017.

Cited by: 228|Views81
EI
Weibo:
PilotNet learns to recognize subtle features which would be hard to anticipate and program by human engineers, such as bushes lining the edge of the road and atypical vehicle classes

Abstract:

As part of a complete software stack for autonomous driving, NVIDIA has created a neural-network-based system, known as PilotNet, which outputs steering angles given images of the road ahead. PilotNet is trained using road images paired with the steering angles generated by a human driving a data-collection car. It derives the necessary d...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • A previous report [1] described an end-to-end learning system for self-driving cars in which a convolutional neural network (CNN) [2] was trained to output steering angles given input images of the road ahead.
  • PilotNet training data contains single images sampled from video from a front-facing camera in the car, paired with the corresponding steering command (1/r), where r is the turning radius of the vehicle.
  • 4. The intermediate mask is scaled up to the size of the maps of layer below in the same way as described Step 2.
Highlights
  • A previous report [1] described an end-to-end learning system for self-driving cars in which a convolutional neural network (CNN) [2] was trained to output steering angles given input images of the road ahead
  • The training data were images from a front-facing camera in a data collection car coupled with the time-synchronized steering angle recorded from a human driver
  • The last mask which is of the size of the input image is normalized to the range from 0.0 to 1.0 and becomes the final visualization mask. This visualization mask shows which regions of the input image contribute most to the output of the network
  • The visualization mask is overlaid on the input image to highlight the pixels in the original camera image to illustrate the salient objects
  • We describe a method for finding the regions in input images by which PilotNet makes its steering decisions, i. e., the salient objects
  • PilotNet learns to recognize subtle features which would be hard to anticipate and program by human engineers, such as bushes lining the edge of the road and atypical vehicle classes
Results
  • This visualization mask shows which regions of the input image contribute most to the output of the network.
  • The visualization mask is overlaid on the input image to highlight the pixels in the original camera image to illustrate the salient objects.
  • Class 1 is meant to include all the regions that have a significant effect on the steering angle output by PilotNet. These regions include all the pixels that correspond to locations where the visualization mask is above a threshold.
  • These regions are dilated by 30 pixels to counteract the increasing span of the higher-level feature map layers with respect to the input image.
  • If the objects found by the method dominate control of the output steering angle, the authors would expect the following: if the authors create an image in which the authors uniformly translate only the pixels in Class 1 while maintaining the position of the pixels in Class 2 and use this new image as input to PilotNet, the authors would expect a significant change in the steering angle output.
  • The image shows highlighted salient regions that were identified using the method of Section 3.
  • Figure 8 shows plots of PilotNet steering output as a function of pixel shift in the input image.
  • The blue line shows the results when the authors shift the pixels that include the salient objects (Class 1).
  • The red line shows the results when the authors shift the pixels not included in the salient objects.
  • Shifting the salient objects results in a linear change in steering angle that is nearly as large as that which occurs when the authors shift the entire image.
Conclusion
  • The authors describe a method for finding the regions in input images by which PilotNet makes its steering decisions, i.
  • Examination of the salient objects shows that PilotNet learns features that “make sense” to a human, while ignoring structures in the camera images that are not relevant to driving.
  • PilotNet learns to recognize subtle features which would be hard to anticipate and program by human engineers, such as bushes lining the edge of the road and atypical vehicle classes
Summary
  • A previous report [1] described an end-to-end learning system for self-driving cars in which a convolutional neural network (CNN) [2] was trained to output steering angles given input images of the road ahead.
  • PilotNet training data contains single images sampled from video from a front-facing camera in the car, paired with the corresponding steering command (1/r), where r is the turning radius of the vehicle.
  • 4. The intermediate mask is scaled up to the size of the maps of layer below in the same way as described Step 2.
  • This visualization mask shows which regions of the input image contribute most to the output of the network.
  • The visualization mask is overlaid on the input image to highlight the pixels in the original camera image to illustrate the salient objects.
  • Class 1 is meant to include all the regions that have a significant effect on the steering angle output by PilotNet. These regions include all the pixels that correspond to locations where the visualization mask is above a threshold.
  • These regions are dilated by 30 pixels to counteract the increasing span of the higher-level feature map layers with respect to the input image.
  • If the objects found by the method dominate control of the output steering angle, the authors would expect the following: if the authors create an image in which the authors uniformly translate only the pixels in Class 1 while maintaining the position of the pixels in Class 2 and use this new image as input to PilotNet, the authors would expect a significant change in the steering angle output.
  • The image shows highlighted salient regions that were identified using the method of Section 3.
  • Figure 8 shows plots of PilotNet steering output as a function of pixel shift in the input image.
  • The blue line shows the results when the authors shift the pixels that include the salient objects (Class 1).
  • The red line shows the results when the authors shift the pixels not included in the salient objects.
  • Shifting the salient objects results in a linear change in steering angle that is nearly as large as that which occurs when the authors shift the entire image.
  • The authors describe a method for finding the regions in input images by which PilotNet makes its steering decisions, i.
  • Examination of the salient objects shows that PilotNet learns features that “make sense” to a human, while ignoring structures in the camera images that are not relevant to driving.
  • PilotNet learns to recognize subtle features which would be hard to anticipate and program by human engineers, such as bushes lining the edge of the road and atypical vehicle classes
Reference
  • Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. End to end learning for self-driving cars, April 25 2016. URL: http://arxiv.org/abs/1604.07316, arXiv:arXiv:1604.07316.
    Findings
  • Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, Winter 1989. URL: http://yann.lecun.org/exdb/publis/pdf/lecun-89e.pdf.
    Locate open access versionFindings
  • Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Larry Jackel, Urs Muller, and Karol Zieba. VisualBackProp: visualizing CNNs for autonomous driving, November 16 2016. URL: https://arxiv.org/abs/1611.05418, arXiv:arXiv:1611.05418.
    Findings
  • D. Baehrens, T. Schroeter, S. Harmeling, M.i Kawanabe, K. Hansen, and K.-R. Muller. How to explain individual classification decisions. J. Mach. Learn. Res., 11:1803–1831, 2010.
    Google ScholarLocate open access versionFindings
  • K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Workshop Proc. ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • P. M. Rasmussen, T. Schmah, K. H. Madsen, T. E. Lund, S. C. Strother, and L. K. Hansen. Visualization of nonlinear classification models in neuroimaging - signed sensitivity maps. BIOSIGNALS, pages 254–263, 2012.
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler, G. W. Taylor, and R. Fergus. Adaptive deconvolutional networks for mid and high level feature learning. In ICCV, 2011.
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Muller, and W Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7):e0130140, 2015. URL: http://dx.doi.org/10.1371/journal.pone.0130140.
    Locate open access versionFindings
Your rating :
0

 

Tags
Comments