A Representer Theorem for Deep Kernel Learning
Journal of Machine Learning Research (JMLR)(2019)CCF ASCI 3区
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Representer Point Selection for Explaining Deep Neural Networks.
被引用301
被引用11
Learning Rates for the Kernel Regularized Regression with a Differentiable Strongly Convex Loss
被引用6
A Kernel Perspective for the Decision Boundary of Deep Neural Networks.
被引用2
Structured Deep Kernel Networks for Data-Driven Closure Terms of Turbulent Flows
被引用2
被引用0
被引用3
Application of Deep Kernel Models for Certified and Adaptive RB-ML-ROM Surrogate Modeling
被引用2
Do Ideas Have Shape? Plato's Theory of Forms As the Continuous Limit of Artificial Neural Networks.
被引用9
Be Greedy and Learn: Efficient and Certified Algorithms for Parametrized Optimal Control Problems
被引用2
Kernel-based Linear System Identification: when Does the Representer Theorem Hold?
被引用0
Artificial Intelligence or Human: when and Why Consumers Prefer AI Recommendations
被引用3