Lightweight, Dynamic Graph Convolutional Networks for AMR to Text Generation

EMNLP 2020, pp. 2162-2172, 2020.

Other Links: arxiv.org|academic.microsoft.com
Weibo:
We propose Lightweight Dynamic Graph Convolutional Networks for Abstract Meaning Representation-totext generation

Abstract:

AMR-to-text generation is used to transduce Abstract Meaning Representation structures (AMR) into text. A key challenge in this task is to efficiently learn effective graph representations. Previously, Graph Convolution Networks (GCNs) were used to encode input AMRs, however, vanilla GCNs are not able to capture non-local information and ...More

Code:

Data:

0
Introduction
  • Graph structures play a pivotal role in NLP because they are able to capture rich structural information.
  • ARG1 i game mod this mod kind join-01
  • This mod kind (a) Vanilla GCNs (b) LDGCNs (c) SANs (d) Structured SANs the realm of work on AMR, the authors focus in this paper on the problem of AMR-to-text generation, i.e. transducing AMR graphs into text that conveys the information in the AMR structure.
  • Graph Convolutional Networks The authors' LDGCN model is closely related to GCNs (Kipf and Welling, 2017) which restrict filters to operate on a first-order neighborhood.
  • A is the adjacency matrix, Auv=1 if there exists a relation that goes from concept u to concept v
Highlights
  • Graph structures play a pivotal role in NLP because they are able to capture rich structural information
  • ARG1 i game mod this mod kind (a) Vanilla Graph Convolution Networks (GCNs) (b) Lightweight Dynamic Graph Convolutional Networks (LDGCNs) (c) Self-Attention Networks (SANs) (d) Structured SANs the realm of work on Abstract Meaning Representation (AMR), we focus in this paper on the problem of AMR-to-text generation, i.e. transducing AMR graphs into text that conveys the information in the AMR structure
  • We consider two kinds of baseline models: 1) models based on Recurrent Neural Networks (Konstas et al, 2017; Cao and Clark, 2019) and Graph Neural Networks (GNNs) (Song et al, 2018; Beck et al, 2018; Damonte and Cohen, 2019; Guo et al, 2019b; Ribeiro et al, 2019)
  • Our model has two variants based on different parameter saving strategies, including LDGCN WT and LDGCN GC, and both of them use the dynamic fusion mechanism (DFM)
  • We propose LDGCNs for AMR-totext generation
  • Compared with existing GCNs and SANs, LDGCNs maintain a better balance between parameter efficiency and model capacity
Methods
  • Experiments on

    AMR-to-text generation show that LDGCNs outperform best reported GCNs and SANs trained on LDC2015E86 and LDC2017T10 with significantly fewer parameters.
  • The authors evaluate the model on the LDC2015E86 (AMR1.0), LDC2017T10 (AMR2.0) and LDC2020T02 (AMR3.0) datasets, which have 16,833, 36,521 and 55,635 instances for training, respectively.
  • Both AMR1.0 and AMR2.0 have 1,368 instances for development, and 1,371 instances for testing.
  • Following Guo et al (2019b), the authors stack 4 LDGCN blocks as the encoder of the model.
Results
Conclusion
  • The authors propose LDGCNs for AMR-totext generation.
  • Compared with existing GCNs and SANs, LDGCNs maintain a better balance between parameter efficiency and model capacity.
  • LDGCNs outperform state-of-the-art models on AMR-to-text generation.
Summary
  • Introduction:

    Graph structures play a pivotal role in NLP because they are able to capture rich structural information.
  • ARG1 i game mod this mod kind join-01
  • This mod kind (a) Vanilla GCNs (b) LDGCNs (c) SANs (d) Structured SANs the realm of work on AMR, the authors focus in this paper on the problem of AMR-to-text generation, i.e. transducing AMR graphs into text that conveys the information in the AMR structure.
  • Graph Convolutional Networks The authors' LDGCN model is closely related to GCNs (Kipf and Welling, 2017) which restrict filters to operate on a first-order neighborhood.
  • A is the adjacency matrix, Auv=1 if there exists a relation that goes from concept u to concept v
  • Methods:

    Experiments on

    AMR-to-text generation show that LDGCNs outperform best reported GCNs and SANs trained on LDC2015E86 and LDC2017T10 with significantly fewer parameters.
  • The authors evaluate the model on the LDC2015E86 (AMR1.0), LDC2017T10 (AMR2.0) and LDC2020T02 (AMR3.0) datasets, which have 16,833, 36,521 and 55,635 instances for training, respectively.
  • Both AMR1.0 and AMR2.0 have 1,368 instances for development, and 1,371 instances for testing.
  • Following Guo et al (2019b), the authors stack 4 LDGCN blocks as the encoder of the model.
  • Results:

    The authors consider two kinds of baseline models: 1) models based on Recurrent Neural Networks (Konstas et al, 2017; Cao and Clark, 2019) and Graph Neural Networks (GNNs) (Song et al, 2018; Beck et al, 2018; Damonte and Cohen, 2019; Guo et al, 2019b; Ribeiro et al, 2019).
  • 2) models based on SANs (Zhu et al, 2019) and structured SANs (Cai and Lam, 2020; Zhu et al, 2019; Wang et al, 2020).
  • Zhu et al (2019) leverage additional SANs to incorporate the relational encoding whereas Cai and Lam (2020) use GRUs. Additional results of ensemble models are included.
  • The authors' model has two variants based on different parameter saving strategies, including LDGCN WT and LDGCN GC, and both of them use the dynamic fusion mechanism (DFM)
  • Conclusion:

    The authors propose LDGCNs for AMR-totext generation.
  • Compared with existing GCNs and SANs, LDGCNs maintain a better balance between parameter efficiency and model capacity.
  • LDGCNs outperform state-of-the-art models on AMR-to-text generation.
Tables
  • Table1: Main results on AMR-to-text generation. B, C, M and #P denote BLEU, CHRF++, METEOR and the model size in terms of parameters, respectively. Results with ‡ are obtained from the authors. We also conduct the statistical significance tests by following (<a class="ref-link" id="cZhu_et+al_2019_a" href="#rZhu_et+al_2019_a">Zhu et al, 2019</a>). All our proposed systems are significant over the baseline at p < 0.01, tested by bootstrap resampling (<a class="ref-link" id="cKoehn_2004_a" href="#rKoehn_2004_a">Koehn, 2004</a>)
  • Table2: Results on AMR1.0 with external training data. ‡ denotes the ensemble model
  • Table3: Results on the AMR3.0. B, C, M and #P denote BLEU, CHRF++, METEOR and the model size in terms of parameters, respectively. The results with † are based on open implementations, while the results with ‡ are obtained from the authors
  • Table4: Comparisons between baselines. +DF denotes dynamic fusion mechanism. +WT and +GC refer to weight tied and group convolutions, respectively
  • Table5: Speed comparisons between baselines. For inference speed, the higher the better. Implementations are based on MXNet (<a class="ref-link" id="cChen_et+al_2015_a" href="#rChen_et+al_2015_a">Chen et al, 2015</a>) and the Sockeye neural machine translation toolkit (<a class="ref-link" id="cFelix_et+al_2017_a" href="#rFelix_et+al_2017_a">Felix et al, 2017</a>). Results on speed are based on beam size 10, batch size 30 on an NVIDIA RTX 1080 GPU
  • Table6: Human evaluation. We also perform significance tests by following (<a class="ref-link" id="cRibeiro_et+al_2019_a" href="#rRibeiro_et+al_2019_a">Ribeiro et al, 2019</a>). Results are statistically significant with p < 0.05
  • Table7: An example of AMR graph and generated sentences by different models
Download tables as Excel
Related work
Funding
  • This research is partially supported by Ministry of Education, Singapore, under its Academic Research Fund (AcRF) Tier 2 Programme (MOE AcRF Tier 2 Award No: MOE2017-T2-1-156)
Study subjects and analysis
human subjects: 30
Following Ribeiro et al (2019), two evaluation criteria are used: (i) meaning similarity: how close in meaning the generated text is to the gold sentence; (ii) readability: how well the generated sentence reads. We randomly select 100 sentences generated by 4 models. 30 human subjects rate the sentences on a 0-100 rating scale. The evaluation is conducted separately and subjects were first given brief instructions explaining the criteria of assessment

subjects: 5
The evaluation is conducted separately and subjects were first given brief instructions explaining the criteria of assessment. For each sentence, we collect scores from 5 subjects and average them. Models are ranked according to the mean of sentence-level scores

Reference
  • Sami Abu-El-Haija, Amol Kapoor, Bryan Perozzi, and Joonseok Lee. 2018. N-gcn: Multi-scale graph convolution for semi-supervised node classification. In Proc. of UAI.
    Google ScholarLocate open access versionFindings
  • Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Hrayr Harutyunyan, Nazanin Alipourfard, Kristina Lerman, Greg Ver Steeg, and Aram Galstyan. 2019. Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2019a. Deep equilibrium models. In Proc. of NeurIPS.
    Google ScholarLocate open access versionFindings
  • Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2019b. Trellis networks for sequence modeling. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proc. of LAW@ACL.
    Google ScholarLocate open access versionFindings
  • Daniel Beck, Gholamreza Haffari, and Trevor Cohn. 2018. Graph-to-sequence learning using gated graph neural networks. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Ben Bogin, Matt Gardner, and Jonathan Berant. 2019a. Global reasoning over database structures for text-tosql parsing. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Ben Bogin, Matt Gardner, and Jonathan Berant. 2019b. Representing schema structure with graph neural networks for text-to-sql parsing. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Deng Cai and Wai Lam. 2020. Graph transformer for graph-to-sequence learning. In Proc. of AAAI.
    Google ScholarLocate open access versionFindings
  • Kris Cao and Stephen Clark. 2019. Factorising AMR generation through syntax. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint.
    Google ScholarFindings
  • Marco Damonte and Shay B. Cohen. 2019. Structural neural encoders for AMR-to-text generation. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Marco Damonte, Ida Szubert, Shay B. Cohen, and Mark Steedman. 2020. The role of reentrancies in abstract meaning representation parsing. In Findings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Yann Dauphin, Angela Fan, Michael Auli, and David Grangier. 2016. Language modeling with gated convolutional networks. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Nicola De Cao, Wilker Aziz, and Ivan Titov. 2019. Question answering by reasoning across documents with graph convolutional networks. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. 2019. Universal transformers. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Michael J. Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proc. of WMT@ACL.
    Google ScholarLocate open access versionFindings
  • Hieber Felix, Domhan Tobias, Denkowski Michael, Vilar David, Sokolov Artem, Clifton Ann, and Post Matt. 2017. Sockeye: A toolkit for neural machine translation. arXiv preprint.
    Google ScholarFindings
  • Jeffrey Flanigan, Chris Dyer, Noah A. Smith, and Jaime G. Carbonell. 2016. Generation from abstract meaning representation using tree transducers. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Zhijiang Guo, Guoshun Nan, Wei Lu, and Shay B. Cohen. 20Learning latent forests for medical relation extraction. In Proc. of IJCAI.
    Google ScholarLocate open access versionFindings
  • Zhijiang Guo, Yan Zhang, and Wei Lu. 2019a. Attention guided graph convolutional networks for relation extraction. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Zhijiang Guo, Yan Zhang, Zhiyang Teng, and Wei Lu. 2019b. Densely connected graph convolutional networks for graph-to-sequence learning. Transactions of the Association for Computational Linguistics, 7:297–312.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv, abs/1704.04861.
    Findings
  • Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proc. of CVPR.
    Google ScholarLocate open access versionFindings
  • Thomas N. Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, and Luke S. Zettlemoyer. 2017. Neural AMR: Sequence-to-sequence models for parsing and generation. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Zhen-Zhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. Albert: A lite bert for self-supervised learning of language representations. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Chen-Yu Lee, Saining Xie, Patrick W. Gallagher, Zhengyou Zhang, and Zhuowen Tu. 2015. Deeplysupervised nets. In Proc. of AISTATS.
    Google ScholarLocate open access versionFindings
  • Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2019a. Can gcns go as deep as cnns? In Proc. of ICCV.
    Google ScholarLocate open access versionFindings
  • Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. 2019b. Selective kernel networks. In Proc. of CVPR.
    Google ScholarLocate open access versionFindings
  • Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated graph sequence neural networks. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Bill Yuchen Lin, Xinyue Chen, Jamin Chen, and Xiang Ren. 2019. Kagnet: Knowledge-aware graph networks for commonsense reasoning. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Sitao Luan, Mingde Zhao, Xiao-Wen Chang, and Doina Precup. 2019. Break the ceiling: Stronger multi-scale deep graph convolutional networks. In Proc. of NeurIPS.
    Google ScholarLocate open access versionFindings
  • Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. 2019. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proc. of AAAI.
    Google ScholarLocate open access versionFindings
  • Felix Wu, Angela Fan, Alexei Baevski, Yann N Dauphin, and Michael Auli. 2019. Pay less attention with lightweight and dynamic convolutions. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Saining Xie, Ross B. Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proc. of CVPR.
    Google ScholarLocate open access versionFindings
  • Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken ichi Kawarabayashi, and Stefanie Jegelka. 2018. Representation learning on graphs with jumping knowledge networks. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proc. of CVPR.
    Google ScholarLocate open access versionFindings
  • Jie Zhu, Junhui Li, Muhua Zhu, Longhua Qian, Min Zhang, and Guodong Zhou. 2019. Modeling graph structure in transformer for better AMR-to-text generation. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Maja Popovic. 2017. chrf++: words helping character n-grams. In Proc. of WMT@ACL.
    Google ScholarLocate open access versionFindings
  • Nima Pourdamghani, Kevin Knight, and Ulf Hermjakob. 2016. Generating english from abstract meaning representations. In Proc. of INLG.
    Google ScholarLocate open access versionFindings
  • Leonardo Filipe Rodrigues Ribeiro, Claire Gardent, and Iryna Gurevych. 2019. Enhancing AMR-to-text generation with dual graph representations. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Linfeng Song, Xiaochang Peng, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2017. AMR-to-text generation with synchronous node replacement grammar. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018. A graph-to-sequence model for AMRto-text generation. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. of NeurIPS.
    Google ScholarLocate open access versionFindings
  • Tianming Wang, Xiaojun Wan, and Hanqi Jin. 2020. AMR-to-text generation with graph transformer. Transactions of the Association for Computational Linguistics, 8:19–33.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments