Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

Sayna Ebrahimi
Sayna Ebrahimi
Suzanne Petryk
Suzanne Petryk
Akash Gokul
Akash Gokul
William Gan
William Gan

ICLR 2021, 2021.

Cited by: 0|Bibtex|Views45
Other Links: arxiv.org
Weibo:
Introducing a connection between continual learning and model explainability by regularizing saliency maps to avoid forgetting and showing its effect on memory and regularization-based continual learning approaches.

Abstract:

The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting. Previous work has shown that leveraging memory in the form of a replay buffer can reduce performance degradation on prior tasks. We hypothesize that forgetting can be further reduced when the model is enc...More

Code:

Data:

0
Introduction
  • Humans are capable of continuously learning novel tasks by leveraging their lifetime knowledge and expanding them when they encounter a new experience.
  • Vanilla Backpropagation (Zeiler & Fergus, 2014): The simplest way to understand and visualize which pixels are most salient in an image is to look at the gradients
  • This is typically done by making a forward pass through the model and taking the gradient of the given output class with respect to the input.
  • To store a saliency map for each RGB image of size 3 × W × H, the authors need an equivalent memory size of storing W × H pixel values
Highlights
  • Humans are capable of continuously learning novel tasks by leveraging their lifetime knowledge and expanding them when they encounter a new experience
  • We propose Remembering for the Right Reasons (RRR), a training strategy guided by model explanations generated by any white-box differentiable explanation method; RRR adds an explanation loss to continual learning
  • We empirically show the effect of RRR in standard and few-shot class incremental learning (CIL) scenarios on popular benchmark datasets including CIFAR100, ImageNet100, and Caltech-UCSD Birds 200 using different network architectures where RRR improves overall accuracy and forgetting over experience replay and other memory-based method
  • We evaluate these metrics once immediately after learning each task, denoted as P ri,i and Rei,i, respectively, and again at the end of the learning process of final task T denoted as P rT,i and ReT,i, where the first subscript refers to the model ID and the second subscript is the test dataset ID on which the model is evaluated
  • We proposed the use of model explanations with continual learning algorithms to enhance better knowledge transfer as well as better recall of the previous tasks
  • We advocate for the use of explainable AI as a tool to improve model performance, rather than as an artifact or interpretation of the model itself We demonstrate that models which incorporate a “right for the right reasons” constraint as part of a continual learning process can both be interpretable and more accurate
Methods
  • To measure how close the baselines can get to this ideal model when they are combined with LRRR, the authors measure the precision as tp/(tp+fp), and recall as tp/(tp+fn)
  • The authors evaluate these metrics once immediately after learning each task, denoted as P ri,i and Rei,i, respectively, and again at the end of the learning process of final task T denoted as P rT,i and ReT,i, where the first subscript refers to the model ID and the second subscript is the test dataset ID on which the model is evaluated.
  • The authors show the evaluation on these metrics in Table 1b for ER and TOPIC with and without LRRR on CUB200 where LRRR increases both precision and recall across all methods, demonstrating that the approach continually makes better predictions because it finds the right evidence for its decisions
Results
  • The authors have tabulated results shown in Figure 2 and Figure 3 with means and standard deviations averaged over 3 runs.

    with and without LRRR using different backbone architectures and saliency map techniques.
Conclusion
  • The authors proposed the use of model explanations with continual learning algorithms to enhance better knowledge transfer as well as better recall of the previous tasks.
  • The intuition behind the method is that encouraging a model to remember its evidence will increase the generalisability and rationality of recalled predictions and help retrieving the relevant aspects of each task.
  • The authors advocate for the use of explainable AI as a tool to improve model performance, rather than as an artifact or interpretation of the model itself The authors demonstrate that models which incorporate a “right for the right reasons” constraint as part of a continual learning process can both be interpretable and more accurate.
  • The authors empirically demonstrated the effectiveness of the approach in a variety of settings and provided an analysis of improved performance and explainability
Summary
  • Introduction:

    Humans are capable of continuously learning novel tasks by leveraging their lifetime knowledge and expanding them when they encounter a new experience.
  • Vanilla Backpropagation (Zeiler & Fergus, 2014): The simplest way to understand and visualize which pixels are most salient in an image is to look at the gradients
  • This is typically done by making a forward pass through the model and taking the gradient of the given output class with respect to the input.
  • To store a saliency map for each RGB image of size 3 × W × H, the authors need an equivalent memory size of storing W × H pixel values
  • Objectives:

    The authors' goal is to investigate whether augmenting experience replay with explanation replay reduces forgetting and how enforcing to remember the explanations will affect the explanations themselves.
  • In this work the authors aim to go one step further and investigate the role of explanations in continual learning, on mitigating forgetting and change of model explanations.
  • The authors aim to achieve this by using memory to enhance better knowledge transfer as well as better avoidance of catastrophic forgetting.
  • The authors' goal is to visualize if adding the loss term LRRR prevents the drifting of explanations
  • Methods:

    To measure how close the baselines can get to this ideal model when they are combined with LRRR, the authors measure the precision as tp/(tp+fp), and recall as tp/(tp+fn)
  • The authors evaluate these metrics once immediately after learning each task, denoted as P ri,i and Rei,i, respectively, and again at the end of the learning process of final task T denoted as P rT,i and ReT,i, where the first subscript refers to the model ID and the second subscript is the test dataset ID on which the model is evaluated.
  • The authors show the evaluation on these metrics in Table 1b for ER and TOPIC with and without LRRR on CUB200 where LRRR increases both precision and recall across all methods, demonstrating that the approach continually makes better predictions because it finds the right evidence for its decisions
  • Results:

    The authors have tabulated results shown in Figure 2 and Figure 3 with means and standard deviations averaged over 3 runs.

    with and without LRRR using different backbone architectures and saliency map techniques.
  • Conclusion:

    The authors proposed the use of model explanations with continual learning algorithms to enhance better knowledge transfer as well as better recall of the previous tasks.
  • The intuition behind the method is that encouraging a model to remember its evidence will increase the generalisability and rationality of recalled predictions and help retrieving the relevant aspects of each task.
  • The authors advocate for the use of explainable AI as a tool to improve model performance, rather than as an artifact or interpretation of the model itself The authors demonstrate that models which incorporate a “right for the right reasons” constraint as part of a continual learning process can both be interpretable and more accurate.
  • The authors empirically demonstrated the effectiveness of the approach in a variety of settings and provided an analysis of improved performance and explainability
Tables
  • Table1: PG experiment results on few-shot CIL CUB200 measuring (a) PG-ACC (%) and PG-BWT (%) and (b) precision and recall averaged over all tasks. P ri,i and Rei,i evaluate the pointing game on each task ti directly after the model has been trained on ti. P rT,i and ReT,i are obtained by the evaluation for task ti using the model trained for all T tasks
  • Table2: Target layer names and activation maps size for saliencies generated by different network architectures in Grad-CAM
  • Table3: Classification accuracy of few-shot CIL learning of CUB200 at the end of 11 tasks for ER
  • Table4: Performance of the state-of-the-art existing approaches with and without LRRR on CUB200 including TOPIC (<a class="ref-link" id="cTao_et+al_2020_a" href="#rTao_et+al_2020_a">Tao et al, 2020</a>), EEIL (<a class="ref-link" id="cCastro_et+al_2018_a" href="#rCastro_et+al_2018_a">Castro et al, 2018</a>), iCaRL (<a class="ref-link" id="cRebuffi_et+al_2017_a" href="#rRebuffi_et+al_2017_a">Rebuffi et al, 2017</a>). Results for baselines are obtained using their original implementation. Results are averaged over 3 runs
  • Table5: Performance of the state-of-the-art existing approaches with and without LRRR on CIFAR100 in 10 tasks. Results for iTAML (<a class="ref-link" id="cRajasegaran_et+al_2020_a" href="#rRajasegaran_et+al_2020_a">Rajasegaran et al, 2020</a>), BiC (<a class="ref-link" id="cWu_et+al_2019_a" href="#rWu_et+al_2019_a">Wu et al, 2019</a>), and EEIL (<a class="ref-link" id="cCastro_et+al_2018_a" href="#rCastro_et+al_2018_a">Castro et al, 2018</a>) are produced with their original implementation while EWC (Kirkpatrick et al, 2017) and LwF (Li & Hoiem, 2016) are re-implemented by us. Results are averaged over 3 runs
  • Table6: Performance of the state-of-the-art existing approaches with and without LRRR on CIFAR100 in 20 tasks. Results for iTAML (<a class="ref-link" id="cRajasegaran_et+al_2020_a" href="#rRajasegaran_et+al_2020_a">Rajasegaran et al, 2020</a>), BiC (<a class="ref-link" id="cWu_et+al_2019_a" href="#rWu_et+al_2019_a">Wu et al, 2019</a>), and EEIL (<a class="ref-link" id="cCastro_et+al_2018_a" href="#rCastro_et+al_2018_a">Castro et al, 2018</a>) are produced with their original implementation while EWC (Kirkpatrick et al, 2017) and LwF (Li & Hoiem, 2016) are re-implemented by us. Results are averaged over 3 runs
  • Table7: Performance of the state-of-the-art existing approaches with and without LRRR on ImageNet100 in 10 tasks. Results for iTAML (<a class="ref-link" id="cRajasegaran_et+al_2020_a" href="#rRajasegaran_et+al_2020_a">Rajasegaran et al, 2020</a>), BiC (<a class="ref-link" id="cWu_et+al_2019_a" href="#rWu_et+al_2019_a">Wu et al, 2019</a>), and EEIL (<a class="ref-link" id="cCastro_et+al_2018_a" href="#rCastro_et+al_2018_a">Castro et al, 2018</a>) are produced with their original implementation while EWC (Kirkpatrick et al, 2017) and LwF (Li & Hoiem, 2016) are re-implemented by us. Results are averaged over 3 runs
Download tables as Excel
Related work
  • Continual learning: Past work in CL has generally made use of either memory, model structure, or regularization to prevent catastrophic forgetting. Memory-based methods store some form of past experience into a replay buffer. However, the definition of “experience” varies between methods. Rehearsal-based methods use episodic memories as raw samples (Robins, 1995; Rebuffi et al, 2017; Riemer et al, 2018) or their gradients (Lopez-Paz et al, 2017; Chaudhry et al, 2019) for the model to revisit. Incremental Classifier and Representation Learning (iCaRL) (Rebuffi et al, 2017), is a class-incremental learner that uses a nearest-exemplar algorithm for classification and prevents catastrophic forgetting by using an episodic memory. iTAML (Rajasegaran et al, 2020) is a task-agnostic meta-learning algorithm that uses a momentum based strategy for meta-update and in addition to the object classification task, it predicts task labels during inference. An end-to-end incremental learning framework (EEIL) (Castro et al, 2018) also uses an exemplar set along with data augmentation and balanced fine-tuning to alleviate the imbalance between the old and new classes. Bias Correction Method (BiC) (Wu et al, 2019) is another class-incremental learning al-
Study subjects and analysis
samples: 3000
We use C classes and K training samples per class as the C-way K-shot few-shot class incrementally learning setting where we have a set of b base classes to learn as the first task while the remaining classes are learned with only a few randomly selected samples. In order to provide a direct comparison to the state-of-the-art work of Tao et al (2020) we precisely followed their setup and and used the same Caltech-UCSD Birds dataset (Wah et al, 2011), divided into 11 disjoint tasks and a 10-way 5-shot setting, where the first task contains b = 100 base classes resulting in 3000 samples for training and 2834 images for testing. The remaining 100 classes are divided into 10 tasks where 5 samples per class are randomly selected as the training set, while the test set is kept intact containing near 300 images per task

samples: 5
In order to provide a direct comparison to the state-of-the-art work of Tao et al (2020) we precisely followed their setup and and used the same Caltech-UCSD Birds dataset (Wah et al, 2011), divided into 11 disjoint tasks and a 10-way 5-shot setting, where the first task contains b = 100 base classes resulting in 3000 samples for training and 2834 images for testing. The remaining 100 classes are divided into 10 tasks where 5 samples per class are randomly selected as the training set, while the test set is kept intact containing near 300 images per task. The images in CUB200 are resized to 256 × 256 and then cropped to 224 × 224 for training

Reference
  • Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, pp. 9505–9515, 2018.
    Google ScholarLocate open access versionFindings
  • David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and KlausRobert MAzller. How to explain individual classification decisions. Journal of Machine Learning Research, 11(Jun):1803–1831, 2010.
    Google ScholarLocate open access versionFindings
  • Francisco M Castro, Manuel J Marın-Jimenez, Nicolas Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pp. 233–248, 2018.
    Google ScholarLocate open access versionFindings
  • Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Gradcam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with A-GEM. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without memorizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5138–5146, 2019.
    Google ScholarLocate open access versionFindings
  • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, and Marcus Rohrbach. Uncertainty-guided continual learning with bayesian neural networks. In International Conference on Learning Representations, 2020a.
    Google ScholarLocate open access versionFindings
  • Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, and Marcus Rohrbach. Adversarial continual learning. In European Conference on Computer Vision (ECCV), 2020b.
    Google ScholarLocate open access versionFindings
  • Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A Rusu, Alexander Pritzel, and Daan Wierstra. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017.
    Findings
  • Tyler L Hayes, Nathan D Cahill, and Christopher Kanan. Memory efficient experience replay for streaming learning. In 2019 International Conference on Robotics and Automation (ICRA), pp. 9769–9776. IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • Kevin Jarrett, Koray Kavukcuoglu, Marc’Aurelio Ranzato, and Yann LeCun. What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th international conference on computer vision, pp. 2146–2153. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Nitin Kamra, Umang Gupta, and Yan Liu. Deep generative dual memory network for continual learning. arXiv preprint arXiv:1710.10368, 2017.
    Findings
  • Ronald Kemker and Christopher Kanan. Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563, 2017.
    Findings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, pp. 201611835, 2017.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, 2009.
    Google ScholarFindings
  • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019.
    Findings
  • David Lopez-Paz et al. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pp. 6467–6476, 2017.
    Google ScholarLocate open access versionFindings
  • Arun Mallya and Svetlana Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • James L McClelland, Bruce L McNaughton, and Randall C O’reilly. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 102(3):419, 1995.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017.
    Google ScholarLocate open access versionFindings
  • Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. In Proceedings of the British Machine Vision Conference (BMVC), 2018.
    Google ScholarLocate open access versionFindings
  • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Mubarak Shah. itaml: An incremental task-agnostic meta-learning approach. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
    Google ScholarLocate open access versionFindings
  • Amal Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. Encoder based lifelong learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1320– 1328, 2017.
    Google ScholarLocate open access versionFindings
  • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, 2016.
    Google ScholarLocate open access versionFindings
  • Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Gerald Tesauro. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910, 2018.
    Findings
  • Anthony Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science, 7(2): 123–146, 1995.
    Google ScholarLocate open access versionFindings
  • Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
    Findings
  • Ramprasaath R Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, and Dhruv Batra. Grad-cam: Why did you say that? arXiv preprint arXiv:1611.07450, 2016.
    Findings
  • Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626, 2017.
    Google ScholarLocate open access versionFindings
  • Dasom Seo, Kanghan Oh, and Il-Seok Oh. Regional multi-scale approach for visually pleasing explanations of deep neural networks. arXiv preprint arXiv:1807.11720, 2018.
    Findings
  • Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 4548–4557. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pp. 2990–2999, 2017.
    Google ScholarLocate open access versionFindings
  • Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, and Anshul Kundaje. Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713, 2016.
    Findings
  • Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3145–3153. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Daniel L Silver, Qiang Yang, and Lianghao Li. Lifelong machine learning systems: Beyond learning algorithms. In 2013 AAAI spring symposium series, 2013.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
    Findings
  • Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viegas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
    Findings
  • Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
    Findings
  • Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3319– 3328. JMLR. org, 2017a.
    Google ScholarLocate open access versionFindings
  • Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365, 2017b.
    Findings
  • Xiaoyu Tao, Xiaopeng Hong, Xinyuan Chang, Songlin Dong, Xing Wei, and Yihong Gong. Fewshot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12183–12192, 2020.
    Google ScholarLocate open access versionFindings
  • Sebastian Thrun and Tom M Mitchell. Lifelong robot learning. Robotics and autonomous systems, 15(1-2):25–46, 1995.
    Google ScholarLocate open access versionFindings
  • C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
    Google ScholarFindings
  • Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale incremental learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 374–382, 2019.
    Google ScholarLocate open access versionFindings
  • Xin Yao, Tianchi Huang, Chenglei Wu, Rui-Xiao Zhang, and Lifeng Sun. Adversarial feature alignment: Avoid catastrophic forgetting in incremental task lifelong learning. Neural computation, 31(11):2266–2291, 2019.
    Google ScholarLocate open access versionFindings
  • Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong learning with dynamically expandable networks. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 3987–3995. PMLR, 2017.
    Google ScholarLocate open access versionFindings
  • Jianming Zhang, Sarah Adel Bargal, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126 (10):1084–1102, 2018.
    Google ScholarLocate open access versionFindings
  • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856, 2014.
    Findings
  • Under review as a conference paper at ICLR 2021 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929, 2016.
    Google ScholarLocate open access versionFindings
  • Table 2 shows the target layer names used in Grad-CAM for different network architectures according to their standard PyTorch (Paszke et al., 2017) implementations. Saliency map size is equal to the activation map of the target layers.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments