AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose C-HMCNN(h), a novel approach for hierarchical multi-label classification problems, which, given a network h for the underlying multi-label classification problem, exploits the hierarchy information in order to produce predictions coherent with the constraint and improv...

Coherent Hierarchical Multi-Label Classification Networks

NIPS 2020, (2020)

Cited by: 0|Views18
EI
Full Text
Bibtex
Weibo

Abstract

Hierarchical multi-label classification (HMC) is a challenging classification task extending standard multi-label classification problems by imposing a hierarchy constraint on the classes. In this paper, we propose C-HMCNN(h), a novel approach for HMC problems, which, given a network h for the underlying multi-label classification problem...More

Code:

Data:

0
Introduction
  • Multi-label classification is a standard machine learning problem in which an object can be associated with multiple labels.
  • HMC problems naturally arise in many domains, such as image classification [12,13,14], text categorization [17, 20, 27], and functional genomics [1, 9, 32]
  • They are very challenging for two main reasons: (i) they are normally characterized by a great class imbalance, because the number of datapoints per class is usually much smaller at deeper levels of the hierarchy, and (ii) the predictions must be coherent.
  • Most of the state-of-the-art models based on neural networks belong to the second category
Highlights
  • Multi-label classification is a standard machine learning problem in which an object can be associated with multiple labels
  • hierarchical multi-label classification (HMC) problems naturally arise in many domains, such as image classification [12,13,14], text categorization [17, 20, 27], and functional genomics [1, 9, 32]
  • We propose C-HMCNN(h), a novel approach for HMC problems, which, given a network h for the underlying multi-label classification problem, exploits the hierarchy information to produce predictions coherent with the hierarchy constraint and improve performance
  • We proposed a new model for HMC problems, called C-HMCNN(h), which is able to (i) leverage the hierarchical information to learn when to delegate the prediction on a superclass to one of its subclasses, (ii) produce predictions coherent by construction, and (iii) outperfom current state-of-the-art models on 20 commonly used real-world HMC benchmarks
  • We will use as h an interpretable model, and study how max constraint module (MCM) and max constraint loss (MCLoss) can be modified to improve the interpretability of C-HMCNN(h)
  • We proposed a novel model that is shown to outperform the current state-of-the-art models on commonly used HMC benchmarks
Results
  • The dropout rate was set to 70% and the batch size to 4.
  • The authors proposed a novel model that is shown to outperform the current state-of-the-art models on commonly used HMC benchmarks
Conclusion
  • The authors proposed a new model for HMC problems, called C-HMCNN(h), which is able to (i) leverage the hierarchical information to learn when to delegate the prediction on a superclass to one of its subclasses, (ii) produce predictions coherent by construction, and (iii) outperfom current state-of-the-art models on 20 commonly used real-world HMC benchmarks.
  • The authors proposed a novel model that is shown to outperform the current state-of-the-art models on commonly used HMC benchmarks.
  • The authors focus on functional genomics, which is the application domain most benchmarks come from
Summary
  • Introduction:

    Multi-label classification is a standard machine learning problem in which an object can be associated with multiple labels.
  • HMC problems naturally arise in many domains, such as image classification [12,13,14], text categorization [17, 20, 27], and functional genomics [1, 9, 32]
  • They are very challenging for two main reasons: (i) they are normally characterized by a great class imbalance, because the number of datapoints per class is usually much smaller at deeper levels of the hierarchy, and (ii) the predictions must be coherent.
  • Most of the state-of-the-art models based on neural networks belong to the second category
  • Objectives:

    The authors' goal is to leverage standard neural network approaches for multi-label classification problems and exploit the hierarchy constraint in order to produce coherent predictions and improve performance.
  • Results:

    The dropout rate was set to 70% and the batch size to 4.
  • The authors proposed a novel model that is shown to outperform the current state-of-the-art models on commonly used HMC benchmarks
  • Conclusion:

    The authors proposed a new model for HMC problems, called C-HMCNN(h), which is able to (i) leverage the hierarchical information to learn when to delegate the prediction on a superclass to one of its subclasses, (ii) produce predictions coherent by construction, and (iii) outperfom current state-of-the-art models on 20 commonly used real-world HMC benchmarks.
  • The authors proposed a novel model that is shown to outperform the current state-of-the-art models on commonly used HMC benchmarks.
  • The authors focus on functional genomics, which is the application domain most benchmarks come from
Tables
  • Table1: Summary of the 20 real-world datasets. Number of features (D), number of classes (n), and number of datapoints for each dataset split
  • Table2: Comparison of C-HMCNN(h) with the other state-of-the-art models. The performance of each system is measured as the AU (P RC) obtained on the test set. The best results are in bold
  • Table3: Impact of MCM and MCM+MCLoss on the performance measured as AU (P RC) and on the total number of epochs for the validation set of the Funcat datasets
Download tables as Excel
Related work
  • HMC problems are a generalization of hierarchical classification problems, where the labels are hierarchically organized, and each datapoint can be assigned to one path in the hierarchy (e.g., [10, 26, 30]). Indeed, in HMC problems, each datapoint can be assigned multiple paths in the hierarchy.

    In the literature, HMC methods are traditionally divided into local and global approaches [29]. Local approaches decompose the problem into smaller classification ones, and then the solutions are combined to solve the main task. Local approaches can be further divided based on the strategy that they deploy to decompose the main task. If a method trains a different classifier for each level of the hierarchy, then we have a local classifier per level as in [5,6,7, 21, 35]. The works [5,6,7] are extended by [33], where HMCN-R and HMCN-F are presented. Since HMCN-R and HMCN-F are trained with both a local loss and a global loss, they are considered hybrid local-global approaches. If a method trains a classifier for each node of the hierarchy, then we have a local classifier per node. In [8], a linear classifier is trained for each node with a loss function that captures the hierarchy structure. On the other hand, in [15], one multi-layer perceptron for each node is deployed. A different approach is proposed in [3], where kernel dependency estimation is employed to project each label to a low-dimensional vector. To preserve the hierarchy structure, a generalized condensing sort and select algorithm is developed, and each vector is then learned singularly using ridge regression. Finally, if a method trains a different classifier per parent node in the hierarchy, then we have a local classifier per parent node. For example, [18] proposes to train a model for each sub-ontology of the Gene Ontology, combining features automatically learned from the sequences and features based on protein interactions. In [34], instead, the authors try to solve the overfitting problem typical of local models by representing the correlation among the labels by the label distribution, and then training each local model to map datapoints to label distributions. Global methods consist of single models able to map objects with their corresponding classes in the hierarchy as a whole. A well-known global method is CLUS-HMC [32], consisting of a single predictive clustering tree for the entire hierarchy. This work is extended in [28], where Clus-Ens, an ensemble of CLUS-HMC, is proposed. In [22], a neural network incorporating the structure of the hierarchy in its architecture is proposed. While this network makes predictions that are coherent with the hierarchy, it also makes the assumption that each parent class is the union of the children. In [4], the authors propose a “competitive neural network”, whose architecture replicates the hierarchy.
Funding
  • Eleonora Giunchiglia is supported by the EPSRC under the grant EP/N509711/1 and by an Oxford-DeepMind Graduate Scholarship
  • This work was also supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1 and by the AXA Research Fund
  • We also acknowledge the use of the EPSRC-funded Tier 2 facility JADE (EP/P020275/1) and GPU computing support by Scan Computers International Ltd
Study subjects and analysis
real-world datasets: 20
In this section, we first discuss how to effectively implement C-HMCNN(h), leveraging GPU architectures. Then, we present the experimental results of C-HMCNN(h), first considering two synthetic experiments, and then on 20 real-world datasets for which we compare with current stateof-the-art models for HMC problems. Finally, ablation studies highlight the positive impact of both MCM and MCLoss on C-HMCNN(h)’s performance.2

real-world datasets: 20
5.4 Comparison with the state of the art. We tested our model on 20 real-world datasets commonly used to compare HMC systems (see, e.g., [3, 23, 32, 33]): 16 are functional genomics datasets [9], 2 contain medical images [13], 1 contains images of microalgae [14], and 1 is a text categorization dataset [17].4. The characteristics of these datasets are summarized in Table 1

Reference
  • Zafer Barutçuoglu, Robert E. Schapire, and Olga G. Troyanskaya. Hierarchical multi-label prediction of gene function. Bioinform., 22(7):830–836, 2006.
    Google ScholarLocate open access versionFindings
  • Alessio Benavoli, Giorgio Corani, and Francesca Mangili. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res., 17(5):1–10, 2016.
    Google ScholarLocate open access versionFindings
  • Wei Bi and James T. Kwok. Multilabel classification on tree- and DAG-structured hierarchies. In Proc. of ICML, pages 17–24, 2011.
    Google ScholarLocate open access versionFindings
  • Helyane Bronoski Borges and Júlio C. Nievola. Multi-label hierarchical classification using a competitive neural network for protein function prediction. In Proc. of IJCNN, pages 1–8, 2012.
    Google ScholarLocate open access versionFindings
  • Ricardo Cerri, Rodrigo C. Barros, and André Carlos Ponce de Leon Ferreira de Carvalho. Hierarchical multi-label classification for protein function prediction: A local approach based on neural networks. In Proc. of ISDA, pages 337–343, 2011.
    Google ScholarLocate open access versionFindings
  • Ricardo Cerri, Rodrigo C. Barros, and André Carlos Ponce de Leon Ferreira de Carvalho. Hierarchical multi-label classification using local neural networks. J. Comput. Syst. Sci., 80(1): 39–56, 2014.
    Google ScholarLocate open access versionFindings
  • Ricardo Cerri, Rodrigo C. Barros, André Carlos Ponce de Leon Ferreira de Carvalho, and Yaochu Jin. Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC Bioinform., 17:373, 2016.
    Google ScholarLocate open access versionFindings
  • Nicolò Cesa-Bianchi, Claudio Gentile, and Luca Zaniboni. Incremental algorithms for hierarchical classification. J. Mach. Learn. Res., 7:31–54, 2006.
    Google ScholarLocate open access versionFindings
  • Amanda Clare. Machine Learning and Data Mining for Yeast Functional Genomics. PhD thesis, University of Wales, 2003.
    Google ScholarFindings
  • Ofer Dekel, Joseph Keshet, and Yoram Singer. Large margin hierarchical classification. In Carla E. Brodley, editor, Proc. of ICML, volume 69. ACM, 2004.
    Google ScholarLocate open access versionFindings
  • Janez Demsar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1–30, 2006.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. ImageNet: A large-scale hierarchical image database. In Proc. of CVPR, pages 248–255, 2009.
    Google ScholarLocate open access versionFindings
  • Ivica Dimitrovski, Dragi Kocev, Suzana Loskovska, and Sašo Džeroski. Hierchical annotation of medical images. In Proc. of IS, pages 174–181. IJS, Ljubljana, 2008.
    Google ScholarLocate open access versionFindings
  • Ivica Dimitrovski, Dragi Kocev, Suzana Loskovska, and Saso Dzeroski. Hierarchical classification of diatom images using ensembles of predictive clustering trees. Ecol. Informatics, 7(1): 19–29, 2012.
    Google ScholarLocate open access versionFindings
  • Shou Feng, Ping Fu, and Wenbin Zheng. A hierarchical multi-label classification method based on neural networks for gene function prediction. Biotechnology and Biotechnological Equipment, 32:1613–1621, 2018.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proc. of ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Bryan Klimt and Yiming Yang. The Enron Corpus: A new dataset for email classification research. In Proc. of ECML, pages 217–226, 2004.
    Google ScholarLocate open access versionFindings
  • Maxat Kulmanov, Mohammad Asif Khan, and Robert Hoehndorf. DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinform., 34 (4):660–668, 2018.
    Google ScholarLocate open access versionFindings
  • Tao Lei, Regina Barzilay, and Tommi S. Jaakkola. Rationalizing neural predictions. In Proc. of EMNLP, pages 107–117, 2016.
    Google ScholarLocate open access versionFindings
  • David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361–397, 2004.
    Google ScholarLocate open access versionFindings
  • Yu Li, Sheng Wang, Ramzan Umarov, Bingqing Xie, Ming Fan, Lihua Li, and Xin Gao. DEEPre: Sequence-based enzyme EC number prediction by deep learning. Bioinform., 34(5):760–769, 2018.
    Google ScholarLocate open access versionFindings
  • Luca Masera and Enrico Blanzieri. AWX: An integrated approach to hierarchical-multilabel classification. In Proc. of ECML-PKDD, pages 322–336, 2018.
    Google ScholarLocate open access versionFindings
  • Felipe Kenji Nakano, Mathias Lietaert, and Celine Vens. Machine learning for discovering missing or wrong protein function annotations — A comparison using updated benchmark datasets. BMC Bioinform., 20(1):485:1–485:32, 2019.
    Google ScholarLocate open access versionFindings
  • Guillaume Obozinski, Gert R. G. Lanckriet, Charles E. Grant, Michael I. Jordan, and William Stafford Noble. Consistent probabilistic outputs for protein function prediction. Genome Biology, 9:S6, 2008.
    Google ScholarLocate open access versionFindings
  • Predrag Radivojac et al. A large-scale evaluation of computational protein function prediction. Nature Methods, 10(3):221–227, 2013.
    Google ScholarLocate open access versionFindings
  • Harish Ramaswamy, Ambuj Tewari, and Shivani Agarwal. Convex calibrated surrogates for hierarchical classification. volume 37 of Proc. of Machine Learning Research, pages 1852–1860. PMLR, 2015.
    Google ScholarLocate open access versionFindings
  • Juho Rousu, Craig Saunders, Sándor Szedmák, and John Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. J. Mach. Learn. Res., 7:1601–1626, 2006.
    Google ScholarLocate open access versionFindings
  • Leander Schietgat, Celine Vens, Jan Struyf, Hendrik Blockeel, Dragi Kocev, and Saso Dzeroski. Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinform., 11:2, 2010.
    Google ScholarLocate open access versionFindings
  • Carlos Nascimento Jr. Silla and Alex Alves Freitas. A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov., 22(1-2):31–72, 2011.
    Google ScholarLocate open access versionFindings
  • Aixin Sun and Ee-Peng Lim. Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining, pages 521–528, 2001.
    Google ScholarLocate open access versionFindings
  • Giorgio Valentini. True path rule hierarchical ensembles for genome-wide gene function prediction. IEEE/ACM Trans. Comput. Biology Bioinform., 8(3):832–847, 2011.
    Google ScholarLocate open access versionFindings
  • Celine Vens, Jan Struyf, Leander Schietgat, Saso Dzeroski, and Hendrik Blockeel. Decision trees for hierarchical multi-label classification. Mach. Learn., 73(2):185–214, 2008.
    Google ScholarLocate open access versionFindings
  • Jonatas Wehrmann, Ricardo Cerri, and Rodrigo C. Barros. Hierarchical multi-label classification networks. In Proc. of ICML, pages 5225–5234, 2018.
    Google ScholarLocate open access versionFindings
  • Changdong Xu and Xin Geng. Hierarchical classification based on label distribution learning. In Proc. of AAAI, pages 5533–5540, 2019.
    Google ScholarLocate open access versionFindings
  • Zhenzhen Zou, Shuye Tian, Xin Gao, and Yu Li. mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning. Frontiers in Genetics, 9, 2019.
    Google ScholarLocate open access versionFindings
Author
Eleonora Giunchiglia
Eleonora Giunchiglia
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科