AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks

Recall and Learn: Fine tuning Deep Pretrained Language Models with Less Forgetting

EMNLP 2020, pp.7870-7881, (2020)

Cited by: 1|Views281
Full Text
Bibtex
Weibo

Abstract

Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning. But such a sequential transfer learning paradigm often confronts the catastrophic forgetting problem and leads to sub-optimal performance. To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopt...More

Code:

Data:

0
Introduction
  • Deep Pretrained Language Models (LMs), such as ELMo (Peters et al, 2018) and BERT (Devlin et al, 2019), have significantly altered the landscape of Natural Language Processing (NLP) and a wide range of NLP tasks has been promoted by these pretrained language models
  • These successes are mainly achieved through Sequential Transfer Learning (Ruder, 2019): pretrain a language model on large-scale unlabeled data and adapt it to downstream tasks.
  • Finetuning updates all the parameters of the pretrained model, while feature extraction regards the pretrained model as a feature extractor and keeps it fixed during the adaptation phase
Highlights
  • Deep Pretrained Language Models (LMs), such as ELMo (Peters et al, 2018) and BERT (Devlin et al, 2019), have significantly altered the landscape of Natural Language Processing (NLP) and a wide range of NLP tasks has been promoted by these pretrained language models
  • Thanks to the less catastrophic forgetting realized by Recall Adam (RECADAM), we can get comparable overall performance with much fewer parameters of the pretrained model
  • Similar to the results with the BERT-base model, We find that our improvements mostly come from the tasks with smaller training data (<10k), and we can improve the ALBERT-xxlarge model’s median performance on these tasks by +1.5% on average
  • Model Initialization With our RECADAM method based on Pretraining Simulation and Objective Shifting, the model can be initialized with random values, and recall the knowledge of pretraining tasks while learning the new tasks
  • We propose the Objective Shifting mechanism to better balance the learning of the pretraining and downstream task
  • Experiments show that our method achieves state-of-the-art performance on the General Language Understanding Evaluation (GLUE) benchmark
  • Experiments demonstrate the superiority of our method in the transferring of deep pretrained language models, and we provide the open-source RECADAM optimizer by integrating the proposed mechanisms into Adam optimizer to facilitate the better usage of deep pretrained language models
Methods
  • RecAdam + PI

    RecAdam + RI random initialization. In contrast, with pretrained initialization, the search space would be limited to around the pretraining model, making it harder for the model to learn the new tasks.

    Forgetting Analysis As introduced in § 3.2, the authors realize multi-task fine-tuning with the Objective Shifting technique, which allows the model’s learning objective to shift from the source tasks to the target tasks gradually.
  • With pretrained initialization, the search space would be limited to around the pretraining model, making it harder for the model to learn the new tasks.
  • Forgetting Analysis As introduced in § 3.2, the authors realize multi-task fine-tuning with the Objective Shifting technique, which allows the model’s learning objective to shift from the source tasks to the target tasks gradually.
  • As discussed in § 3.2, Fine-tuning and multi-task learning can be regarded as the special cases (k → ∞ and k → 0) of the method
Results
  • Table 1 shows the single-task single-model results of the RECADAM fine-tuning method comparing to the vanilla fine-tuning method with BERT-base and ALBERT-xxlarge model on the dev set of the GLUE benchmark.
  • BERT-base With the BERT-base model, the authors outperform the vanilla fine-tuning method on 7 out of 8 tasks of the GLUE benchmark and achieve 1.1% improvements on the average median performance.
  • Table 2 shows the performance comparison of different initialization strategies for RECADAM obtained by the BERT-base model
  • It shows that RECADAM, with both initialization strategies, can outperform the vanilla fine-tuning method on all the four tasks.
  • It is because the model would benefit from a larger parameter search space with
Conclusion
  • The authors solve the catastrophic forgetting in transferring deep pretrained language models by bridging two transfer learning paradigm: sequential fine-tuning and multi-task learning.
  • To cope with the absence of pretraining data during the joint learning of pretraining task, the authors propose a Pretraining Simulation mechanism to learn the pretraining task without data.
  • The authors propose the Objective Shifting mechanism to better balance the learning of the pretraining and downstream task.
  • Experiments demonstrate the superiority of the method in the transferring of deep pretrained language models, and the authors provide the open-source RECADAM optimizer by integrating the proposed mechanisms into Adam optimizer to facilitate the better usage of deep pretrained language models
Summary
  • Introduction:

    Deep Pretrained Language Models (LMs), such as ELMo (Peters et al, 2018) and BERT (Devlin et al, 2019), have significantly altered the landscape of Natural Language Processing (NLP) and a wide range of NLP tasks has been promoted by these pretrained language models
  • These successes are mainly achieved through Sequential Transfer Learning (Ruder, 2019): pretrain a language model on large-scale unlabeled data and adapt it to downstream tasks.
  • Finetuning updates all the parameters of the pretrained model, while feature extraction regards the pretrained model as a feature extractor and keeps it fixed during the adaptation phase
  • Methods:

    RecAdam + PI

    RecAdam + RI random initialization. In contrast, with pretrained initialization, the search space would be limited to around the pretraining model, making it harder for the model to learn the new tasks.

    Forgetting Analysis As introduced in § 3.2, the authors realize multi-task fine-tuning with the Objective Shifting technique, which allows the model’s learning objective to shift from the source tasks to the target tasks gradually.
  • With pretrained initialization, the search space would be limited to around the pretraining model, making it harder for the model to learn the new tasks.
  • Forgetting Analysis As introduced in § 3.2, the authors realize multi-task fine-tuning with the Objective Shifting technique, which allows the model’s learning objective to shift from the source tasks to the target tasks gradually.
  • As discussed in § 3.2, Fine-tuning and multi-task learning can be regarded as the special cases (k → ∞ and k → 0) of the method
  • Results:

    Table 1 shows the single-task single-model results of the RECADAM fine-tuning method comparing to the vanilla fine-tuning method with BERT-base and ALBERT-xxlarge model on the dev set of the GLUE benchmark.
  • BERT-base With the BERT-base model, the authors outperform the vanilla fine-tuning method on 7 out of 8 tasks of the GLUE benchmark and achieve 1.1% improvements on the average median performance.
  • Table 2 shows the performance comparison of different initialization strategies for RECADAM obtained by the BERT-base model
  • It shows that RECADAM, with both initialization strategies, can outperform the vanilla fine-tuning method on all the four tasks.
  • It is because the model would benefit from a larger parameter search space with
  • Conclusion:

    The authors solve the catastrophic forgetting in transferring deep pretrained language models by bridging two transfer learning paradigm: sequential fine-tuning and multi-task learning.
  • To cope with the absence of pretraining data during the joint learning of pretraining task, the authors propose a Pretraining Simulation mechanism to learn the pretraining task without data.
  • The authors propose the Objective Shifting mechanism to better balance the learning of the pretraining and downstream task.
  • Experiments demonstrate the superiority of the method in the transferring of deep pretrained language models, and the authors provide the open-source RECADAM optimizer by integrating the proposed mechanisms into Adam optimizer to facilitate the better usage of deep pretrained language models
Tables
  • Table1: State-of-the-art single-task single-model results on the dev set of the GLUE benchmark. The number below each task refers to the number of training data. The average scores of the tasks with large training data (>10k), the tasks with small training data (<10k), and all the tasks are reported separately. We rerun the baseline of vanilla fine-tuning without further pretraining on MNLI. We report median and maximum over 5 runs
  • Table2: Comparison of different model initialization strategies: pretrained initialization (PI) and Random Initialization (RI). We report median over 5 runs
Download tables as Excel
Related work
Funding
  • Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark
  • Experiments on GLUE benchmark with the BERT-base model show that the proposed method can significantly outperform the vanilla fine-tuning method
  • We achieve state-of-the-art performance on GLUE benchmark with the ALBERT-xxlarge model
Reference
  • Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. 2018. Memory aware synapses: Learning what (not) to forget. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part III, volume 11207 of Lecture Notes in Computer Science, pages 144– 16Springer.
    Google ScholarLocate open access versionFindings
  • Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. 2019. Online continual learning with no task boundaries. CoRR, abs/1903.08671.
    Findings
  • Gaurav Arora, Afshin Rahimi, and Timothy Baldwin. 2019. Does an LSTM forget more than a cnn? an empirical study of catastrophic forgetting in NLP. In Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, ALTA 2019, Sydney, Australia, December 4-6, 2019, pages 77–86. Australasian Language Technology Association.
    Google ScholarLocate open access versionFindings
  • Antonio Valerio Miceli Barone, Barry Haddow, Ulrich Germann, and Rico Sennrich. 2017. Regularization techniques for fine-tuning in neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 911, 2017, pages 1489–149Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Luisa Bentivogli, Bernardo Magnini, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proceedings of the Second Text Analysis Conference, TAC 2009, Gaithersburg, Maryland, USA, November 16-17, 2009. NIST.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 201Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, August 11-12, 2016, pages 10–21. ACL.
    Google ScholarLocate open access versionFindings
  • Rich Caruana. 199Multitask learning. Mach. Learn., 28(1):41–75.
    Google ScholarLocate open access versionFindings
  • Daniel M. Cer, Mona T. Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. 2017. Semeval2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017, pages 1–14. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, Puneet Kumar Dokania, Thalaiyasingam Ajanthan, and Philip H. S. Torr. 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XI, volume 11215 of Lecture Notes in Computer Science, pages 556–572. Springer.
    Google ScholarLocate open access versionFindings
  • Alexandra Chronopoulou, Christos Baziotis, and Alexandros Potamianos. 2019. An embarrassingly simple approach for transfer learning from pretrained language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 2089–2095. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop, pages 177–190. Springer.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing, IWP@IJCNLP 2005, Jeju Island, Korea, October 2005, 2005. Asian Federation of Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1615–1625, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Robert M. French. 1999. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, 3(4):128 – 135.
    Google ScholarLocate open access versionFindings
  • Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. 2007. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1–9. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211.
    Findings
  • Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. CoRR, abs/1503.02531.
    Findings
  • Yutai Hou, Zhihan Zhou, Yijia Liu, Ning Wang, Wanxiang Che, Han Liu, and Ting Liu. 20Few-shot sequence labeling with label dependency transfer. arXiv preprint arXiv:1906.08711.
    Findings
  • Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
    Findings
  • Ferenc Huszar. 2017. On quadratic penalties in elastic weight consolidation. arXiv preprint arXiv:1712.03847.
    Findings
  • Heechul Jung, Jeongwoo Ju, Minju Jung, and Junmo Kim. 2016. Less-forgetting learning in deep neural networks. CoRR, abs/1607.00122.
    Findings
  • Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
    Google ScholarLocate open access versionFindings
  • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A lite BERT for selfsupervised learning of language representations. CoRR, abs/1909.11942.
    Findings
  • Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory G. Slabaugh, and Tinne Tuytelaars. 2019. Continual learning: A comparative study on how to defy forgetting in classification tasks. CoRR, abs/1909.08383.
    Findings
  • Cheolhyoung Lee, Kyunghyun Cho, and Wanmo Kang. 2019. Mixout: Effective regularization to finetune large-scale pretrained language models. arXiv preprint arXiv:1909.11299.
    Findings
  • Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, JungWoo Ha, and Byoung-Tak Zhang. 2017. Overcoming catastrophic forgetting by incremental moment matching. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 4652–4662.
    Google ScholarLocate open access versionFindings
  • Hector J. Levesque, Ernest Davis, and Leora Morgenstern. 2012. The winograd schema challenge. In Principles of Knowledge Representation and Reasoning: Proceedings of the Thirteenth International Conference, KR 2012, Rome, Italy, June 10-14, 2012. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Xuhong Li, Yves Grandvalet, and Franck Davoine. 2018. Explicit inductive bias for transfer learning with convolutional networks. arXiv preprint arXiv:1802.01483.
    Findings
  • Zhizhong Li and Derek Hoiem. 2018. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell., 40(12):2935–2947.
    Google ScholarLocate open access versionFindings
  • Xialei Liu, Marc Masana, Luis Herranz, Joost van de Weijer, Antonio M. Lopez, and Andrew D. Bagdanov. 2018. Rotate your networks: Better weight consolidation and less catastrophic forgetting. In 24th International Conference on Pattern Recognition, ICPR 2018, Beijing, China, August 20-24, 2018, pages 2262–2268. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
    Findings
  • David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 6467–6476.
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
    Google ScholarFindings
  • David J. C. MacKay. 2003. Information theory, inference, and learning algorithms. Cambridge University Press.
    Google ScholarFindings
  • Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 7765–7773. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • James Martens. 2014. New perspectives on the natural gradient method. CoRR, abs/1412.1193.
    Findings
  • Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier.
    Google ScholarLocate open access versionFindings
  • Lili Mou, Zhao Meng, Rui Yan, Ge Li, Yan Xu, Lu Zhang, and Zhi Jin. 2016. How transferable are neural networks in NLP applications? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 479– 489. The Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
    Findings
  • Matthew E. Peters, Sebastian Ruder, and Noah A. Smith. 2019. To tune or not to tune? adapting pretrained representations to diverse tasks. In Proceedings of the 4th Workshop on Representation Learning for NLP, RepL4NLP@ACL 2019, Florence, Italy, August 2, 2019, pages 7–14. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jason Phang, Thibault Fevry, and Samuel R Bowman. 2018. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088.
    Findings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100, 000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 2383–2392. The Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. 2017. icarl: Incremental classifier and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 5533–5542. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • Joao Ribeiro, Francisco S Melo, and Joao Dias. 2019. Multi-task learning and catastrophic forgetting in continual reinforcement learning. arXiv preprint arXiv:1909.10008.
    Findings
  • Amir Rosenfeld and John K. Tsotsos. 2020. Incremental learning through deep adaptation. IEEE Trans. Pattern Anal. Mach. Intell., 42(3):651–663.
    Google ScholarLocate open access versionFindings
  • Bill Dolan Lisa Ferro Danilo Giampiccolo Bernardo Magnini Roy Bar-Haim, Ido Dagan and Idan Szpektor. 2006. The second pascal recognising textual entailment challenge. In Proceedings of the second PASCAL challenges workshop on recognising textual entailment, page 64.
    Google ScholarLocate open access versionFindings
  • Sebastian Ruder. 2017. An overview of multitask learning in deep neural networks. CoRR, abs/1706.05098.
    Findings
  • Sebastian Ruder. 2019. Neural transfer learning for natural language processing. Ph.D. thesis, NUI Galway.
    Google ScholarFindings
  • Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive neural networks. CoRR, abs/1606.04671.
    Findings
  • Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. 2018.
    Google ScholarFindings
  • Progress & compress: A scalable framework for continual learning. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmassan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 4535–4544. PMLR.
    Google ScholarLocate open access versionFindings
  • Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmassan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 4555–4564. PMLR.
    Google ScholarLocate open access versionFindings
  • Nikhil Dandekar Shankar Iyer and Kornl Csernai. January 2017. First quora dataset release: Question pairs. https://www.quora.com/q/quoradata/ First-Quora-Dataset-Release-Question-Pairs.
    Findings
  • Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 2990–2999.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1631–1642. ACL.
    Google ScholarLocate open access versionFindings
  • Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification? In China National Conference on Chinese Computational Linguistics, pages 194–206. Springer.
    Google ScholarLocate open access versionFindings
  • Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. 2019. Overcoming catastrophic forgetting during domain adaptation of neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2062–2068.
    Google ScholarLocate open access versionFindings
  • Amal Rannen Triki, Rahaf Aljundi, Matthew B. Blaschko, and Tinne Tuytelaars. 2017. Encoder based lifelong learning. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 1329–1337. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
    Google ScholarFindings
  • Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman. 2019. Neural network acceptability judgments. TACL, 7:625–641.
    Google ScholarLocate open access versionFindings
  • Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ju Xu and Zhanxing Zhu. 2018. Reinforced continual learning. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montreal, Canada, pages 907–916.
    Google ScholarLocate open access versionFindings
  • Ying Xu, Xu Zhong, Antonio Jose Jimeno Yepes, and Jey Han Lau. 2019. Forget me not: Reducing catastrophic forgetting for domain adaptation in reading comprehension. arXiv preprint arXiv:1911.00202.
    Findings
  • Jiabin Xue, Jiqing Han, Tieran Zheng, Xiang Gao, and Jiaxing Guo. 2019. A multi-task learning framework for overcoming the catastrophic forgetting in automatic speech recognition. arXiv preprint arXiv:1904.08039.
    Findings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pages 5754–5764.
    Google ScholarLocate open access versionFindings
  • Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3987–3995. PMLR.
    Google ScholarLocate open access versionFindings
  • Jeffrey O. Zhang, Alexander Sax, Amir Roshan Zamir, Leonidas J. Guibas, and Jitendra Malik. 2019a. Side-tuning: Network adaptation via additive side networks. CoRR, abs/1912.13503.
    Findings
  • Junting Zhang, Jie Zhang, Shalini Ghosh, Dawei Li, Serafettin Tasci, Larry P. Heck, Heming Zhang, and C.-C. Jay Kuo. 2019b. Class-incremental learning via deep model consolidation. CoRR, abs/1903.07864.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科