K-BERT: Enabling Language Representation With Knowledge Graph

Weijie Liu
Weijie Liu
Peng Zhou
Peng Zhou
Zhe Zhao
Zhe Zhao
Qi Ju
Qi Ju
Ping Wang
Ping Wang

national conference on artificial intelligence, 2020.

Cited by: 6|Bibtex|Views552
Other Links: academic.microsoft.com|arxiv.org
Weibo:
Knowledge-enabled language representation model is compatible with the model parameters of BERT, which means that users can directly adopt the available pre-trained BERT parameters on knowledge-enabled language representation model without pre-training by themselves

Abstract:

Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation mod...More

Code:

Data:

Introduction
  • Unsupervised pre-trained Language Representation (LR) models like BERT (Devlin et al 2018) have achieved promising results in multiple NLP tasks.
  • Publicly-provided models, like BERT, GPT (Radford et al 2018), and XLNet (Yang et al 2019), who were pre-trained over open-domain corpora, act just like an ordinary people.
  • Even though they can refresh the state-of-the-art of GLUE (Wang et al 2018) benchmark by learning from open-domain corpora, they may fail in some domain-specific tasks, due to little knowledge connection between specific and open domain.
  • Pre-training is time-consuming and computationally expensive, making it unacceptable to most users
Highlights
  • Unsupervised pre-trained Language Representation (LR) models like BERT (Devlin et al 2018) have achieved promising results in multiple NLP tasks
  • This paper proposes a knowledge-enabled Language Representation model, namely knowledge-enabled language representation model, which is compatible with BERT and can incorporate domain knowledge without Heterogeneous Embedding Space and knowledge noise issues;
  • We propose knowledge-enabled language representation model to enable language representation with knowledge graphs, achieving the capability of commonsense or domain knowledge
  • Soft-position and visible matrix are adapted to control the scope of knowledge, preventing it from deviating from its original meaning
  • Empirical results demonstrate that knowledge graphs is especially helpful for knowledge-driven specific-domain tasks and can be used to solve problems that require domain experts
  • knowledge-enabled language representation model is compatible with the model parameters of BERT, which means that users can directly adopt the available pre-trained BERT parameters (e.g., Google BERT, Baidu-ERNIE, etc.) on knowledge-enabled language representation model without pre-training by themselves
Methods
  • The authors detail the implementation of K-BERT and its overall framework is presented in Figure 1.
  • The authors denote a sentence s = {w0, w1, w2, ..., wn} as a sequence of tokens, where n is the length of this sentence.
  • Each token wi is included in the vocabulary V, wi ∈ V.
  • KG, denoted as K, is a collection of triples ε =, where wi and wk are the name of entities, and rj ∈ V is the relation between them.
  • All the triples are in KG, i.e., ε ∈ K
Conclusion
  • The authors propose K-BERT to enable language representation with knowledge graphs, achieving the capability of commonsense or domain knowledge.
  • Soft-position and visible matrix are adapted to control the scope of knowledge, preventing it from deviating from its original meaning.
  • Despite the challenges of HES and KN, the investigation reveals promising results on twelve open-/specific- domain NLP tasks.
  • Empirical results demonstrate that KG is especially helpful for knowledge-driven specific-domain tasks and can be used to solve problems that require domain experts.
  • K-BERT is compatible with the model parameters of BERT, which means that users can directly adopt the available pre-trained BERT parameters (e.g., Google BERT, Baidu-ERNIE, etc.) on K-BERT without pre-training by themselves
Summary
  • Introduction:

    Unsupervised pre-trained Language Representation (LR) models like BERT (Devlin et al 2018) have achieved promising results in multiple NLP tasks.
  • Publicly-provided models, like BERT, GPT (Radford et al 2018), and XLNet (Yang et al 2019), who were pre-trained over open-domain corpora, act just like an ordinary people.
  • Even though they can refresh the state-of-the-art of GLUE (Wang et al 2018) benchmark by learning from open-domain corpora, they may fail in some domain-specific tasks, due to little knowledge connection between specific and open domain.
  • Pre-training is time-consuming and computationally expensive, making it unacceptable to most users
  • Methods:

    The authors detail the implementation of K-BERT and its overall framework is presented in Figure 1.
  • The authors denote a sentence s = {w0, w1, w2, ..., wn} as a sequence of tokens, where n is the length of this sentence.
  • Each token wi is included in the vocabulary V, wi ∈ V.
  • KG, denoted as K, is a collection of triples ε =, where wi and wk are the name of entities, and rj ∈ V is the relation between them.
  • All the triples are in KG, i.e., ε ∈ K
  • Conclusion:

    The authors propose K-BERT to enable language representation with knowledge graphs, achieving the capability of commonsense or domain knowledge.
  • Soft-position and visible matrix are adapted to control the scope of knowledge, preventing it from deviating from its original meaning.
  • Despite the challenges of HES and KN, the investigation reveals promising results on twelve open-/specific- domain NLP tasks.
  • Empirical results demonstrate that KG is especially helpful for knowledge-driven specific-domain tasks and can be used to solve problems that require domain experts.
  • K-BERT is compatible with the model parameters of BERT, which means that users can directly adopt the available pre-trained BERT parameters (e.g., Google BERT, Baidu-ERNIE, etc.) on K-BERT without pre-training by themselves
Tables
  • Table1: Results of various models on sentence classification tasks on open-domain tasks (Acc. %)
  • Table2: Results of various models on NLPCC-DBQA (M RR %) and MSRA-NER (F 1 %)
  • Table3: Results of various models on specific-domain tasks (%)
Download tables as Excel
Related work
  • Since Google Inc. launched BERT in 2018, many endeavors have been made for further optimization, basically focusing on the pre-training process and the encoder.

    In optimizing pre-training process, Baidu-ERNIE (Sun et al 2019) and BERT-WWM (Cui et al 2019) adopt wholeword masking rather than single character masking for pretraining BERT in Chinese corpora. SpanBERT (Joshi et al 2019) extended BERT by masking contiguous random spans and proposed a span boundary objective. RoBERTa (Liu et al 2019) optimized the pre-training of BERT in three ways, i.e., deleting the target of the next sentence prediction, dynamically changing the masking strategy and using more and longer sentences for training. In optimizing the encoder of BERT, XLNet (Yang et al 2019) replaced the Transformer in BERT with Transformer-XL (Dai et al 2019) to improve its ability to process long sentences. THU-ERNIE (Zhang et al 2019) modified the encoder of BERT to an aggregator for the mutual integration of word and entities.
Funding
  • This work is funded by 2019 Tencent Rhino-Bird Elite Training Program
Study subjects and analysis
positive samples: 60000
• Shopping Shopping is a online shopping review dataset that contains 40,000 reviews, including 21,111 positive reviews and 18,889 negative reviews;. • Weibo Weibo is a dataset with emotional annotations from Sina Weibo, including 60,000 positive samples and 60,000 negative samples. XNLI (Conneau et al 2018), LCQMC (Liu et al 2018) are two-sentence classification tasks, NLPCC-DBQA12 is a Q&A matching task, and MSRA-NER (Levow 2006) is a Named Entity Recognition (NER) task: 6https://github.com/google-research/bert 7https://embedding.github.io/evaluation/ 8https://github.com/pengming617/bert classification 9https://share.weiyun.com/5xxYiig 10https://share.weiyun.com/5lEsv0w 11https://book.douban.com/ 12http://tcci.ccf.org.cn/conference/2016/dldoc/evagline2.pdf

datasets: 2
Domain Q&A We crawl about 770,000 and 36,000 Q&A samples from Baidu Zhidao13 in financial and legal domains, including questions, netizen answers, and best answers. Based on this, we built two datasets, i.e., Finance Q&A and Law Q&A. The task is to choose the best answer for the question from the netizen’s answers

financial news articles: 3000
The task is to choose the best answer for the question from the netizen’s answers. Domain NER Finance NER14 is a dataset including 3000 financial news articles manually labeled with over 65,000 name entities (people, location and organization). Medicine NER is the Clinical Named Entity Recognition (CNER) task released in CCKS 201715

Reference
  • Bodenreider, O. 2008. Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearbook of medical informatics 17(01):67– 79.
    Google ScholarLocate open access versionFindings
  • Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, 2787–2795.
    Google ScholarLocate open access versionFindings
  • Bosselut, A.; Rashkin, H.; Sap, M.; Malaviya, C.; Celikyilmaz, A.; and Choi, Y. 2019. COMET: Commonsense transformers for automatic knowledge graph construction. arXiv preprint arXiv:1906.05317.
    Findings
  • Cao, Y.; Hou, L.; Li, J.; Liu, Z.; Li, C.; Chen, X.; and Dong, T. 2018. Joint representation learning of cross-lingual words and entities via attentive distant supervision. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Conneau, A.; Lample, G.; Rinott, R.; Williams, A.; Bowman, S. R.; Schwenk, H.; and Stoyanov, V. 2018. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z.; Wang, S.; and Hu, G. 2019. Pre-training with whole word masking for chinese bert. arXiv preprint arXiv:1906.08101.
    Findings
  • Dai, Z.; Yang, Z.; Yang, Y.; Cohen, W. W.; Carbonell, J.; Le, Q. V.; and Salakhutdinov, R. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
    Findings
  • Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 201BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Dong, Z.; Dong, Q.; and Hao, C. 2006. Hownet and the computation of meaning. Citeseer.
    Google ScholarFindings
  • Han, X.; Liu, Z.; and Sun, M. 2016. Joint representation learning of text and knowledge for knowledge graph completion. arXiv preprint arXiv:1611.04125.
    Findings
  • Joshi, M.; Chen, D.; Liu, Y.; Weld, D. S.; Zettlemoyer, L.; and Levy, O. 2019. SpanBERT: Improving pretraining by representing and predicting spans. arXiv preprint arXiv:1907.10529.
    Findings
  • Levow, G. A. 2006. The 3rd international chinese language processing bakeoff. In Sighan Workshop on Chinese Language Processing.
    Google ScholarLocate open access versionFindings
  • Liu, X.; Chen, Q.; Deng, C.; Zeng, H.; Chen, J.; Li, D.; and Tang, B. 2018. LCQMC:a large-scale chinese question matching corpus. International conference on computational linguistics 1952–1962.
    Google ScholarFindings
  • Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V. 2019.
    Google ScholarFindings
  • Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    Findings
  • Peters, M. E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer, L. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
    Findings
  • Radford, A.; Narasimhan, K.; Salimans, T.; and Sutskever, I. 2018. Improving language understanding by generative pre-training. Technical report, OpenAI.
    Google ScholarFindings
  • Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; and Wu, H. 2019. ERNIE: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223.
    Findings
  • Toutanova, K.; Chen, D.; Pantel, P.; Poon, H.; Choudhury, P.; and Gamon, M. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1499–1509.
    Google ScholarLocate open access versionFindings
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in neural information processing systems, 5998–6008.
    Google ScholarLocate open access versionFindings
  • Wang, Z.; Zhang, J.; Feng, J.; and Chen, Z. 2014. Knowledge graph and text jointly embedding. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1591–1601.
    Google ScholarLocate open access versionFindings
  • Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; and Bowman, S. R. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
    Findings
  • Xu, B.; Xu, Y.; Liang, J.; Xie, C.; Liang, B.; Cui, W.; and Xiao, Y. 2017. Cn-dbpedia: A never-ending chinese knowledge extraction system. International conference industrial, engineering and other applications applied intelligent systems 428–438.
    Google ScholarFindings
  • Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; and Le, Q. V. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237.
    Findings
  • Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; and Liu, Q. 2019. ERNIE: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129.
    Findings
Full Text
Your rating :
0

 

Tags
Comments