AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Inspired by human learning from the learning history, we study a novel metric, “dynamic instance hardness ”, which evaluates the hardness of a sample by using a running mean of an instantaneous hardness metric over training history

Curriculum Learning by Dynamic Instance Hardness

NIPS 2020, (2020)

Cited by: 0|Views30
Full Text
Bibtex
Weibo

Abstract

A good teacher can adjust the curriculum based on students’ learning history. By analogy, in this paper, we study the dynamics of a deep neural network’s (DNN) performance on individual samples during its learning process. The observed properties allow us to develop an adaptive curriculum that leads to faster learning of more accurate mod...More

Code:

Data:

0
Introduction
  • A curriculum plays an important role in human learning.
  • In machine learning, instead of training the model with a random sequence of data, recent work in curriculum learning (CL) [4, 25, 17, 52, 13] shows that manipulating the sequence of training data can improve both training efficiency and model accuracy.
  • Inspired by human learning curricula, a schedule of training samples is constructed, sometimes combining with other criteria.
  • As exhibited in previous work, CL can help to avoid local minima, improve the training efficiency, and can lead to better generalization performance
Highlights
  • A curriculum plays an important role in human learning
  • Inspired by human learning from the learning history, we study a novel metric, “dynamic instance hardness (DIH)”, which evaluates the hardness of a sample by using a running mean of an instantaneous hardness metric over training history
  • We find that DIH is a powerful tool to study the learning dynamics of deep neural network (DNN) and reveals several interesting properties of DNNs on individual samples during the course of training
  • We develop DIH guided curriculum learning (DIHCL) in order to improve both the efficiency and final test-set performance without introducing notable extra costs, since DIH only needs to be lazily updated using by-products of training
  • We propose DIH guided curriculum learning as a general framework to improve efficiency for training machine learning models and their final performance
  • As DIHCL is inspired by human learning, our results on machine learning models can perhaps return the favor and be inspiring for those studying mechanisms behind true human learning
Methods
  • The authors train different DNNs by using variants of DIHCL, and compare them with three baselines, vanilla random mini-batch SGD, self-paced learning (SPL) [25], and minimax curriculum learning (MCL) [52] on 11 image classification datasets, i.e., (A) WideResNet-2810 [50] on CIFAR10 and CIFAR100 [24]; (B) ResNeXt50-32x4d [49] on Food-101 [6], FGVC Aircraft (Aircraft) [30], Stanford Cars [23], and Birdsnap [5]; (C) ResNet50 [14] on ImageNet [11]; (D) WideResNet-16-8 on Fashion-MNIST (FMNIST) [48] and Kuzushiji-MNIST (KMNIST) [8]; (E) PreActResNet34 [14] on STL10 [9] and SVHN [34].
  • The authors use T0 = 5, = 0.95, k = 0.85 for all DIHCL variants, and gradually reduce k from n to 0.2n.
  • For DIHCL variants that further reduce St by solving Eq (3), the authors use 1P= 1.0, = 0.8, k0 = 0.4 and employ the “facility location” submodular function [10] G(S) = j2St maxi2S !i,j where !i,j represents the similarity between sample xi and xj.
Results
  • The authors further report and compare the dynamics in four scenarios using plots as Figure 2: (1) under 100% label noise (Figure 9); (2) under 40% label noise (Figure 14); (3) training a smaller DNN (Figure 10); and (4) using exponential decaying learning rate across episodes (Figure 13).
Conclusion
  • Inspired by human learning from the learning history, the authors study a novel metric, “dynamic instance hardness (DIH)”, which evaluates the hardness of a sample by using a running mean of an instantaneous hardness metric over training history.
  • The authors find that DIH is a powerful tool to study the learning dynamics of DNNs and reveals several interesting properties of DNNs on individual samples during the course of training
  • Based on these properties, the authors develop DIH guided curriculum learning (DIHCL) in order to improve both the efficiency and final test-set performance without introducing notable extra costs, since DIH only needs to be lazily updated using by-products of training.
  • The authors may use a metric similar to DIH to select human learning materials to test if better human learning efficiency can be achieved
Tables
  • Table1: The test accuracy (%) achieved by different methods training DNNs on 11 datasets (without pre-training). We use “Loss, dLoss, Flip” to denote the 3 choices of DIH metrics based on (A), (B), and (C) respectively. In all DIHCL variants, we apply lazier-than-lazy-greedy [<a class="ref-link" id="c32" href="#r32">32</a>] for Eq (3) on all datasets except Food-101, Birdsnap, Aircraft (FGVC Aircraft), Cars (Stanford Cars), and ImageNet. For each dataset, the best accuracy is in blue, the second best is red, and third best green
Download tables as Excel
Related work
  • Early curriculum learning (CL) [20, 3, 40] work shows that feeding an optimized sequence of training sets (i.e., a curriculum), that can be designed by a human expert [4], into the training algorithms can improve the models’ performance. Self-paced learning (SPL) [25, 42, 41, 43] chooses the curriculum based on hardness (e.g., per-sample loss) during training. SPL selects samples with smaller loss, and gradually increases the subset size overtime to cover all the training data. Self-paced curriculum learning [17] combines the human expert in CL and loss-adaptation in SPL. SPL with diversity (SPLD) [16] adds a negative group sparse regularization term to SPL and increases its weight to increase selection diversity. Machine teaching [20, 55, 35] aims to find the optimal and smallest training subset leading to similar performance as training on all the data. Minimax curriculum learning (MCL) [52] argues that the diversity of samples [47, 19, 46] is more critical in early learning since it encourages exploration, while difficulty becomes more useful later. It also uses a form of instantaneous instance hardness (loss) but is not dynamic like DIH, and it formulates optimization as a minimax problem. Compared to the above methods, DIHCL has the following advantages: (1) DIHCL improves the efficiency of CL since extra inference on the entire training set per step is not required; and (2) DIHCL uses DIH as the metric for hardness which is a more stable measure than instantaneous hardness.
Funding
  • Funding Disclosure This research is based upon work supported by the National Science Foundation under Grant No IIS-1162606, the National Institutes of Health under award R01GM103544, and by a Google, a Microsoft, and an Intel research award. It is also supported by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA. Some GPUs used to produce the experimental results are donated by NVIDIA.
Study subjects and analysis
datasets: 11
Compared to existing CL methods: (1) DIH is more stable over time than using only instantaneous hardness, which is noisy due to stochastic training and DNN’s non-smoothness; (2) DIHCL is computationally inexpensive since it uses only a byproduct of back-propagation and thus does not require extra inference. On 11 datasets, DIHCL significantly outperforms random mini-batch SGD and recent CL methods in terms of efficiency and final performance. We train different DNNs by using variants of DIHCL, and compare them with three baselines, vanilla random mini-batch SGD, self-paced learning (SPL) [25], and minimax curriculum learning (MCL) [52] on 11 image classification datasets (without pre-training), i.e., (A) WideResNet-2810 [50] on CIFAR10 and CIFAR100 [24]; (B) ResNeXt50-32x4d [49] on Food-101 [6], FGVC Aircraft (Aircraft) [30], Stanford Cars [23], and Birdsnap [5]; (C) ResNet50 [14] on ImageNet [11]; (D) WideResNet-16-8 on Fashion-MNIST (FMNIST) [48] and Kuzushiji-MNIST (KMNIST) [8]; (E) PreActResNet34 [14] on STL10 [9] and SVHN [34]

image classification datasets: 11
On 11 datasets, DIHCL significantly outperforms random mini-batch SGD and recent CL methods in terms of efficiency and final performance. We train different DNNs by using variants of DIHCL, and compare them with three baselines, vanilla random mini-batch SGD, self-paced learning (SPL) [25], and minimax curriculum learning (MCL) [52] on 11 image classification datasets (without pre-training), i.e., (A) WideResNet-2810 [50] on CIFAR10 and CIFAR100 [24]; (B) ResNeXt50-32x4d [49] on Food-101 [6], FGVC Aircraft (Aircraft) [30], Stanford Cars [23], and Birdsnap [5]; (C) ResNet50 [14] on ImageNet [11]; (D) WideResNet-16-8 on Fashion-MNIST (FMNIST) [48] and Kuzushiji-MNIST (KMNIST) [8]; (E) PreActResNet34 [14] on STL10 [9] and SVHN [34]. We use mini-batch SGD with momentum of 0.9 and cyclic cosine annealing learning rate schedule [29] (multiple episodes with starting/target learning rate decayed by a multiplicative factor 0.85)

datasets: 11
Compared to existing CL methods: (1) DIH is more stable over time than using only instantaneous hardness, which is noisy due to stochastic training and DNN’s non-smoothness; (2) DIHCL is computationally inexpensive since it uses only a byproduct of back-propagation and thus does not require extra inference. On 11 datasets, DIHCL significantly outperforms random mini-batch SGD and recent CL methods in terms of efficiency and final performance. A curriculum plays an important role in human learning

datasets: 11
We provide several options for weighted sampling, which introduces different types of randomness, and we integrate subset diversity into the selection criteria as well. Empirically, we evaluate several variants of DIHCL and compare them against random mini-batch SGD as well as recent curriculum learning algorithms on 11 datasets. DIHCL shows an advantage over other baselines in terms both of time/sample efficiency and test set accuracy

randomly selected samples: 50
Our observations also suggest that learning simple patterns [1] happens mainly amongst the easily memorable samples early during training. Our problem is distinct from catastrophic forgetting [21], which consid- Figure 1: Top: DIH (running mean of loss) vs. Bottom: ers sequential learning of multiple tasks, instantaneous loss of 50 randomly selected samples from CIwhere later learned tasks make the model FAR10 on WideResNet-28-10. forget what has been learned from earlier tasks. In our work, we consider single task learning

samples with the largest r40: 10000
Instead of visualizing rt(i) for all i 2 [50000] training samples, we use rt(i) (with at(i) being prediction flips) to categorize them into three groups, and we do this at epochs 10 (early training), 40 (middle), and 210 (later training). At epoch 40, the 10,000 samples with the largest r40(i) comprise the first group, the 10,000 samples ones with the smallest r40(i) comprise the next group, and the remaining 30,000 samples comprise the final group. We will show that the training dynamics of the three groups have different characteristics

samples with small DIH at Epoch 40 dicted: 10
We empirically verify that the samples with large/small DIH in the future can be pre-. 10 samples with large DIH at Epoch 40 10 samples with small DIH at Epoch 40 dicted by only using the DIH during early epochs. In Figure 3, we show the overlap rate of hard/easy samples between any epochs as two upper-triangle matrices

samples with small DIH at Epoch 40 example: 10
For. 10 samples with large DIH at Epoch 40 10 samples with small DIH at Epoch 40 example, given Ui, the 10k samples with the largest DIH in epoch 15i, and Uj for any j > i, Ai,j = |Ui\Uj|/10000 for the matrix A in the left plot. Similarly, the matrix in the right plot measures the overlap rate

samples with small DIH at Epoch 40 for the: 10
Similarly, the matrix in the right plot measures the overlap rate. 10 samples with large DIH at Epoch 40 10 samples with small DIH at Epoch 40 for the 10k samples with the smallest DIH. between epoch 15i and epoch 15j

samples: 10
They show that after a few early epochs, DIH. can accurately predict the hard and easy Figure 4: The three strategies for DIH on 10 hard and 10 samples in the future. This verifies our easy samples, each that have been randomly sampled from the statement in the last paragraph

image classification datasets: 11
The Beta distribution encourages exploration when the difference between rt(i) and c rt(i) is small. We train different DNNs by using variants of DIHCL, and compare them with three baselines, vanilla random mini-batch SGD, self-paced learning (SPL) [25], and minimax curriculum learning (MCL) [52] on 11 image classification datasets (without pre-training), i.e., (A) WideResNet-2810 [50] on CIFAR10 and CIFAR100 [24]; (B) ResNeXt50-32x4d [49] on Food-101 [6], FGVC Aircraft (Aircraft) [30], Stanford Cars [23], and Birdsnap [5]; (C) ResNet50 [14] on ImageNet [11]; (D) WideResNet-16-8 on Fashion-MNIST (FMNIST) [48] and Kuzushiji-MNIST (KMNIST) [8]; (E) PreActResNet34 [14] on STL10 [9] and SVHN [34]. We use mini-batch SGD with momentum of 0.9 and cyclic cosine annealing learning rate schedule [29] (multiple episodes with starting/target learning rate decayed by a multiplicative factor 0.85)

datasets: 4
We utilize a Gaussian kernel for similarity using neural net features (e.g., the inputs to the last fully connected layer in our experiments) z(x) for each x, i.e., !i,j = exp kz(xi) z(xj)k2/2 2 , where is the mean value of all the k(k 1)/2 pairwise distances. In Figure 6, we show how the test set accuracy changes when increasing the number of training batches in each curriculum learning method on 4 datasets. In Figure 7, we report wall-clock training time on 2 datasets

datasets: 11
Although every variant of DIHCL achieves the best accuracy among all the evaluated methods on some datasets, DIHCL-Exp using loss and DIHCL-Beta using prediction flip, as the instantaneous hardness, exhibit advantages over the other DIHCL variants. Particularly, DIHCL-Exp with dLoss(metric (B)) is the best variant across datasets (achieving the top-2 performance on 8 out of 11 datasets). We conduct an ablation study comparing several possible variants of DIHCL with their results reported in Figure 8

datasets: 11
Based on these properties, we develop DIH guided curriculum learning (DIHCL) in order to improve both the efficiency and final test-set performance without introducing notable extra costs, since DIH only needs to be lazily updated using by-products of training. We demonstrate DIHCL’s advantages over several recent CL methods and random baseline on 11 datasets. We propose DIH guided curriculum learning as a general framework to improve efficiency for training machine learning models and their final performance

Reference
  • Devansh Arpit, Stanisław Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. A closer look at memorization in deep networks. In ICML, volume 70, pages 233–242, 2017.
    Google ScholarLocate open access versionFindings
  • Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48–77, 2003.
    Google ScholarLocate open access versionFindings
  • Sumit Basu and Janara Christensen. Teaching classification boundaries to humans. In AAAI, pages 109–115, 2013.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In ICML, pages 41–48, 2009.
    Google ScholarLocate open access versionFindings
  • Thomas Berg, Jiongxin Liu, Seung Woo Lee, Michelle L. Alexander, David W. Jacobs, and Peter N. Belhumeur. Birdsnap: Large-scale fine-grained visual categorization of birds. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2014.
    Google ScholarLocate open access versionFindings
  • Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – mining discriminative components with random forests. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum. Active bias: Training more accurate neural networks by emphasizing high variance samples. In Advances in Neural Information Processing Systems 30, pages 1002–1012. 2017.
    Google ScholarLocate open access versionFindings
  • Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature, 2018.
    Google ScholarFindings
  • Adam Coates, Honglak Lee, and Andrew Y. Ng. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, pages 215–223, 2011.
    Google ScholarLocate open access versionFindings
  • G. Cornuéjols, M. Fisher, and G.L. Nemhauser. On the uncapacitated location problem. Annals of Discrete Mathematics, 1:163–177, 1977.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Satoru Fujishige. Submodular functions and optimization. Annals of discrete mathematics. Elsevier, 2005.
    Google ScholarFindings
  • Guy Hacohen and Daphna Weinshall. On the power of curriculum learning in training deep networks. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Angela H. Jiang, Daniel L. K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, and Padmanabhan Pillai. Accelerating deep learning by focusing on the biggest losers, 2019.
    Google ScholarFindings
  • Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, and Alexander G. Hauptmann. Self-paced learning with diversity. In NeurIPS, pages 2078–2086, 2014.
    Google ScholarLocate open access versionFindings
  • Lu Jiang, Deyu Meng, Qian Zhao, Shiguang Shan, and Alexander G. Hauptmann. Self-paced curriculum learning. In AAAI, pages 2694–2700, 2015.
    Google ScholarLocate open access versionFindings
  • Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning, pages 2304–2313, 2018.
    Google ScholarLocate open access versionFindings
  • Joseph K J, R Vamshi Teja, Singh Krishnakant, and N Balasubramanian Vineeth. Submodular batch selection for training deep neural networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Faisal Khan, Xiaojin (Jerry) Zhu, and Bilge Mutlu. How do humans teach: On curriculum learning and teaching dimension. In NeurIPS, pages 1449–1457, 2011.
    Google ScholarLocate open access versionFindings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences (PNAS), 114(13):3521–3526, 2017.
    Google ScholarLocate open access versionFindings
  • Andreas Krause and Cheng S Ong. Contextual gaussian process bandit optimization. In Advances in Neural Information Processing Systems, pages 2447–2455, 2011.
    Google ScholarLocate open access versionFindings
  • Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
    Google ScholarFindings
  • M. Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. In NeurIPS, pages 1189–1197, 2010.
    Google ScholarLocate open access versionFindings
  • Sebastian Leitner. Leitner system, 1970.
    Google ScholarFindings
  • Junwei Liang, Lu Jiang, Deyu Meng, and Alexander Hauptmann. Learning to detect concepts from webly-labeled video data. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, pages 1746—-1752, 2016.
    Google ScholarLocate open access versionFindings
  • I. Loshchilov and F. Hutter. Online batch selection for faster training of neural networks. In International Conference on Learning Representations (ICLR) 2016 Workshop Track, May 2016.
    Google ScholarLocate open access versionFindings
  • I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • S. Maji, J. Kannala, E. Rahtu, M. Blaschko, and A. Vedaldi. Fine-grained visual classification of aircraft. Technical report, 2013.
    Google ScholarFindings
  • Michel Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques, volume 7 of Lecture Notes in Control and Information Sciences, chapter 27, pages 234–243. Springer Berlin Heidelberg, 1978.
    Google ScholarLocate open access versionFindings
  • Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrák, and Andreas Krause. Lazier than lazy greedy. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pages 1812–1818, 2015.
    Google ScholarLocate open access versionFindings
  • G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265–294, 1978.
    Google ScholarLocate open access versionFindings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
    Google ScholarLocate open access versionFindings
  • Kaustubh R Patil, Xiaojin Zhu, Ł ukasz Kopec, and Bradley C Love. Optimal teaching for limited-capacity human learners. In NeurIPS, pages 2465–2473, 2014.
    Google ScholarLocate open access versionFindings
  • Ricardo B. C. Prudencio, Jose Hernandez-Orallo, and Adolfo Martinez-Uso. Analysis of instance hardness in machine learning using item response theory. In 2nd International Workshop on Learning over Multiple Contexts (LMCE 2015), 2015.
    Google ScholarFindings
  • Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.
    Google ScholarLocate open access versionFindings
  • Michael R. Smith and Tony Martinez. A comparative evaluation of curriculum learning with filtering and boosting in supervised classification problems. Comput. Intell., 32(2):167–195, 2016.
    Google ScholarLocate open access versionFindings
  • Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. An instance level analysis of data complexity. Machine Learning, 95(2):225–256, 2014.
    Google ScholarLocate open access versionFindings
  • Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. Baby Steps: How “Less is More” in unsupervised dependency parsing. In NeurIPS 2009 Workshop on Grammar Induction, Representation of Language and Language Learning, 2009.
    Google ScholarLocate open access versionFindings
  • James Steven Supancic III and Deva Ramanan. Self-paced learning for long-term tracking. In CVPR, pages 2379–2386, 2013.
    Google ScholarLocate open access versionFindings
  • Kevin Tang, Vignesh Ramanathan, Li Fei-fei, and Daphne Koller. Shifting weights: Adapting object detectors from image to video. In NeurIPS, pages 638–646, 2012.
    Google ScholarLocate open access versionFindings
  • Ye Tang, Yu-Bin Yang, and Yang Gao. Self-paced dictionary learning for image classification. In MM, pages 833–836, 2012.
    Google ScholarLocate open access versionFindings
  • William R Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, 25(3-4):285–294, 1933.
    Google ScholarLocate open access versionFindings
  • Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J. Gordon. An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Shengjie Wang, Wenruo Bai, Chandrashekhar Lavania, and Jeff Bilmes. Fixing mini-batch sequences with hierarchical robust partitioning. volume 89 of Proceedings of Machine Learning Research, pages 3352–3361, 2019.
    Google ScholarLocate open access versionFindings
  • Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
    Google ScholarFindings
  • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In BMVC, 2016.
    Google ScholarLocate open access versionFindings
  • Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Tianyi Zhou and Jeff Bilmes. Minimax curriculum learning: Machine teaching with desirable difficulties and scheduled diversity. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Tianyi Zhou, Shengjie Wang, and Jeff Bilmes. Time-consistent self-supervision for semisupervised learning. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • Tianyi Zhou, Shengjie Wang, and Jeff A Bilmes. Diverse ensemble evolution: Curriculum datamodel marriage. In Advances in Neural Information Processing Systems 31, pages 5905–5916. 2018.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu. Machine teaching: An inverse problem to machine learning and an approach toward optimal education. In AAAI, pages 4083–4087, 2015.
    Google ScholarLocate open access versionFindings
Author
Shengjie Wang
Shengjie Wang
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科