AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We present a simple yet effective approach, Recurrent Graph Syntax Encoder, to inform Neural machine translation models with explicit syntactic dependency information

Recurrent Graph Syntax Encoder for Neural Machine Translation

Cited: 3|Views25
Full Text
Bibtex
Weibo

Abstract

Syntax-incorporated machine translation models have been proven successful in improving the model's reasoning and meaning preservation ability. In this paper, we propose a simple yet effective graph-structured encoder, the Recurrent Graph Syntax Encoder, dubbed \textbf{RGSE}, which enhances the ability to capture useful syntactic inform...More

Code:

Data:

0
Introduction
Highlights
  • Neural machine translation (NMT), proposed as a novel end-to-end paradigm (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2015; Gehring et al, 2017; Wu et al, 2016; Vaswani et al, 2017), has obtained competitive performance compared to statistical machine translation (SMT)
  • We presents a novel Recurrent Graph Syntax Encoder (RGSE), casting nodes in graph layer as RNNCells, which has the central approach of capturing syntactic dependencies and word order information simultaneously
  • We propose a simple yet effective representation method, RGSE for NMT, which is done over a standard encoder and could informs the NMT model with comprehensive syntactic dependencies
  • 5https://nlp.stanford.edu/projects/nmt 6The label will remain on each substring if a word is splitted by byte-pair encodings (BPE)
  • We present a simple yet effective approach, Recurrent Graph Syntax Encoder (RGSE), to inform NMT models with explicit syntactic dependency information
  • The proposed RGSE is a migratable component on the encoder side which regards RNNCells as graph nodes and injects syntactic dependencies as edges, thereby capturing syntactic information and word order information simultaneously
Methods
  • The aims of experiments are (1) finding the optimal structure of RGSE on validation data set (2) proving the superiority of RGSE over existing tree&graph-structure syntax-aware models (3) assessing the effectiveness of RGSE-based Transformer compared with several SOTA models. 4.1 Setup

    To compare with the results reported by previous works (Bastings et al, 2017; Beck et al, 2018)

    under the recurrent NMT scenario, the authors conduct experiments on News Commentary V11 corpora from WMT161, comprising approximate 226K En-De and 118K En-Cs sentence pairs respectively, where the data and settings are consistent with them.
  • The aims of experiments are (1) finding the optimal structure of RGSE on validation data set (2) proving the superiority of RGSE over existing tree&graph-structure syntax-aware models (3) assessing the effectiveness of RGSE-based Transformer compared with several SOTA models.
  • Under the recurrent NMT scenario, the authors conduct experiments on News Commentary V11 corpora from WMT161, comprising approximate 226K En-De and 118K En-Cs sentence pairs respectively, where the data and settings are consistent with them.
  • The authors followed Vaswani et al (2017) to set the configurations and report results on
Results
  • To achieve aim (2), the authors first report and analyze the BLEU scores on NC-v11 En-De and En-Cs test.
  • 5https://nlp.stanford.edu/projects/nmt 6The label will remain on each substring if a word is splitted by BPE.
  • # Layers Speed Val.
Conclusion
  • The authors present a simple yet effective approach, Recurrent Graph Syntax Encoder (RGSE), to inform NMT models with explicit syntactic dependency information.
  • The authors' experiments on En-De and En-Cs tasks show that RGSE consistently enhances recurrent NMT (Bahdanau et al, 2015) and Transformer (Vaswani et al, 2017), achieving the competitive results on par with the SOTA model.
  • It will be interesting to apply RGSE to other natural language generation tasks, such as text summarization and conversation
Tables
  • Table1: Different settings that employed RGSE on different layer combinations in Transformer. “speed” denotes training speed measured in steps per second
  • Table2: Experiments on NC-v11 dataset. “↑ / ⇑”: significantly outperform their counterpart (p < 0.05/0.01)
  • Table3: Comparing with several SOTA models on WMT14 En-De test sets. “↑ / ⇑”: significantly outperform their counterpart (p < 0.05/0.01)
Download tables as Excel
Related work
  • The RGSE is inspired by two research themes: Incorporating linguistic features : Several approaches have incorporated linguistic features into NMT models since Tai et al (2015) demonstrated that incorporating structured semantic information could enhance the representations. Sennrich and Haddow (2016) fed the encoder cell combined embeddings of linguistic features including lemmas, subword tags, etc. Eriguchi et al (2016) employed the tree-based encoder to model syntactic structure. Li et al (2017) showed that stitching the word and linearization of parse tree is a effective method to incorporate syntax. Zaremoodi and Haffari (2018); Ma et al (2018) utilized a forest-to-sequence model, which encoded a collection of packed parse trees to compensate for the parser errors, which was superior to the tree-based model. But their works does not utilize graph network to model structured data. Jointly learning of both semantic information and attentional translation is another prevalent approach that appropriately introduces linguistic knowledge. To the best of our knowledge, Luong et al (2016) first proposed adding source syntax into NMT with a sharing encoder. Niehues and Cho (2017) trained the machine translation system with POS and namedentities(NE) tasks at the same time, gaining considerable improvements in multiple tasks. Zhang et al (2019) concatenated the original NMT word representation and the syntax-aware word representation derived from the well-trained dependency parser. However, they considered more implicit information, overlooking the importance of explicit prior knowledge, and have not proven their effectiveness in the Transformer.
Reference
  • Roee Aharoni and Yoav Goldberg. 2017. Towards string-to-tree neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 132–140, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Karim Ahmed, Nitish Shirish Keskar, and Richard Socher. 2018. Weighted transformer network for machine translation.
    Google ScholarFindings
  • Antonios Anastasopoulos and David Chiang. 2018. Tied multitask learning for neural speech translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 82–91.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR 2015.
    Google ScholarLocate open access versionFindings
  • Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Simaan. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of EMNLP 2017.
    Google ScholarLocate open access versionFindings
  • Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. 201Interaction networks for learning about objects, relations and physics. In Proceedings of NIPS 2016, pages 4502– 4510.
    Google ScholarLocate open access versionFindings
  • Daniel Beck, Gholamreza Haffari, and Trevor Cohn. 2018. Graph-to-sequence learning using gated graph neural networks. In Proceedings of ACL 2018.
    Google ScholarLocate open access versionFindings
  • Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of ACL 2017, pages 1936–1945.
    Google ScholarLocate open access versionFindings
  • Tobias Domhan. 2018. How much attention do you need? a granular analysis of neural machine translation architectures. In Proceedings of ACL 2018.
    Google ScholarLocate open access versionFindings
  • Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of ACL 2016, pages 823–833.
    Google ScholarLocate open access versionFindings
  • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of ICML 2017.
    Google ScholarLocate open access versionFindings
  • William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of NIPS 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of CVPR 2016, pages 770– 778.
    Google ScholarLocate open access versionFindings
  • Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of EMNLP 2013, pages 1700–1709.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. 2016. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
    Findings
  • Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M. Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proc. ACL.
    Google ScholarLocate open access versionFindings
  • Adhiguna Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, and Phil Blunsom. 2018. Lstms can learn syntax-sensitive dependencies well, but modeling structure makes them better. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1426–1436.
    Google ScholarLocate open access versionFindings
  • Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of ACL 2017, pages 688–697.
    Google ScholarLocate open access versionFindings
  • Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. Assessing the ability of lstms to learn syntaxsensitive dependencies. Transactions of the Association for Computational Linguistics, 4:521–535.
    Google ScholarLocate open access versionFindings
  • Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task sequence to sequence learning. In Proceedings of ICLR 2016.
    Google ScholarLocate open access versionFindings
  • Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, and Eiichiro Sumita. 2018. Forestbased neural machine translation. In Proceedings of ACL 2018.
    Google ScholarLocate open access versionFindings
  • Diego Marcheggiani, Joost Bastings, and Ivan Titov. 2018. Exploiting semantics in neural machine translation with graph convolutional networks. In Proceedings of NAACL 2018.
    Google ScholarLocate open access versionFindings
  • Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. In Proceedings of EMNLP 2017, pages 1506–1515.
    Google ScholarLocate open access versionFindings
  • Jan Niehues and Eunah Cho. 2017. Exploiting linguistic resources for neural machine translation using multi-task learning. In Proceedings of the WMT 2017.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of ACL 2002, pages 311–318.
    Google ScholarLocate open access versionFindings
  • Alessandro Raganato and Jorg Tiedemann. 2018. An analysis of encoder representations in transformerbased machine translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 287–297.
    Google ScholarLocate open access versionFindings
  • Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich and Barry Haddow. 2016. Linguistic input features improve neural machine translation. In Proceedings of the WMT 2016, pages 83–91.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1715–1725.
    Google ScholarLocate open access versionFindings
  • Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 464–468.
    Google ScholarLocate open access versionFindings
  • Xing Shi, Inkit Padhi, and Kevin Knight. 2016. Does string-based neural mt learn source syntax? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1526– 1534.
    Google ScholarLocate open access versionFindings
  • Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, and Jinsong Su. 2019. Semantic neural machine translation using amr. arXiv preprint arXiv:1902.07282.
    Findings
  • Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018a. A graph-to-sequence model for amrto-text generation. In Proceedings of ACL 2018.
    Google ScholarLocate open access versionFindings
  • Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018b. N-ary relation extraction using graph state lstm. In Proceedings of EMNLP 2018.
    Google ScholarLocate open access versionFindings
  • Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne. 2016. Syntactically guided neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 299–305, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. 2018. Linguistically-informed self-attention for semantic role labeling. arXiv preprint arXiv:1804.08199.
    Findings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS 2014, pages 3104– 3112.
    Google ScholarLocate open access versionFindings
  • Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of ACL 2015.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS 2017.
    Google ScholarLocate open access versionFindings
  • Felix Wu, Angela Fan, Alexei Baevski, Yann N Dauphin, and Michael Auli. 2019. Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430.
    Findings
  • Shuangzhi Wu, Dongdong Zhang, Zhirui Zhang, Nan Yang, Mu Li, and Ming Zhou. 2018. Dependency-to-dependency neural machine translation. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 26(11):2132–2141.
    Google ScholarLocate open access versionFindings
  • Shuangzhi Wu, Ming Zhou, and Dongdong Zhang. 2017. Improved neural machine translation with source syntax. In IJCAI, pages 4179–4185.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. In Proceedings of NIPS 2016.
    Google ScholarLocate open access versionFindings
  • Baosong Yang, Zhaopeng Tu, Derek F Wong, Fandong Meng, Lidia S Chao, and Tong Zhang. 2018. Modeling localness for self-attention networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4449– 4458.
    Google ScholarLocate open access versionFindings
  • Poorya Zaremoodi and Gholamreza Haffari. 2018. Incorporating syntactic uncertainty in neural machine translation with a forest-to-sequence model. In Proceedings of COLING 2018, pages 1421–1429.
    Google ScholarLocate open access versionFindings
  • Meishan Zhang, Zhenghua Li, Guohong Fu, and Min Zhang. 2019. Syntax-enhanced neural machine translation with syntax-aware word representations. arXiv preprint arXiv:1905.02878.
    Findings
  • Yue Zhang, Qi Liu, and Linfeng Song. 2018. Sentencestate lstm for text representation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 317–327.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn