AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Experiments on neural machine translation, text summarization, and text generation have demonstrated the effectiveness of our student-forcing optimal transport algorithm, yielding improved performance over strong baselines on these tasks

Improving Text Generation with Student Forcing Optimal Transport

EMNLP 2020, pp.9144-9156, (2020)

Cited by: 0|Views655
Full Text
Bibtex
Weibo

Abstract

Neural language models are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testin...More

Code:

Data:

0
Introduction
  • Natural language generation is an essential component of many NLP applications, such as machine translation (Bahdanau et al, 2015), image captioning (You et al, 2016), text summarization (See et al, 2017), dialogue systems (Vinyals and Le, 2015), and machine comprehension (Nguyen et al, 2016).
  • Generating human-like natural language is typically cast as predicting a sequence of consecutive words in a recurrent manner.
  • In Recurrent Neural Network (RNN) models, this is known as Teacher-Forcing (TF) (Williams and Zipser, 1989), due to the use of ground-truth tokens for next-token prediction.
  • The model is required to use outputs from the last step instead of the unseen ground-truth, which is often referred to as Student-Forcing (SF).
  • There is a discrepancy between training and inference, accumulating errors along the sequencegeneration trajectory (Ranzato et al, 2016a)
Highlights
  • Natural language generation is an essential component of many NLP applications, such as machine translation (Bahdanau et al, 2015), image captioning (You et al, 2016), text summarization (See et al, 2017), dialogue systems (Vinyals and Le, 2015), and machine comprehension (Nguyen et al, 2016)
  • Our work provides the following contributions: i) We introduce a novel method for text generation called Student-Forcing optimal transport (OT) (SFOT), leveraging OT loss to improve long-term sequence sampling. ii) A new context-preserving OT approach is proposed to effectively match a text sequence with order information. iii) We examine the necessity of integrating OT with Student-Forcing via Imitation Learning. iv) The proposed models are robust demonstrated by extensive empirical evaluations on Neural Machine Translation (NMT), Text Summarization, and Neural Text Generation (NLG)
  • Besides the difference in SF decoding and TF decoding in two methods, we propose a technique on “Contextualized OT with Order-Preserving Regularizer”, which improves both student-forcing optimal transport (SFOT) and TFOT, as shown in Table 4
  • In Section 2.3, we provide theoretical justification on why SFOT can reduce exposure bias, while TFOT still suffers from it: TFOT is based on partial expert trajectories and induces a bias occupancy measure, while our proposed method SFOT uses previous self-generated words and can obtain an optimal policy
  • We have introduced SFOT to mitigate exposure bias in text generation
  • Experiments on neural machine translation, text summarization, and text generation have demonstrated the effectiveness of our SFOT algorithm, yielding improved performance over strong baselines on these tasks
Methods
  • To reasonably select the best model along the temperature sweep, the authors are motivated by (Gu et al, 2019) and propose the BLEU-F1 score to evaluate model.
  • Figure 5 shows the BLEU-F1 score versus reverse temperature on MLE and SFOT.
  • The authors observed that the best temperature for MLE model is 1/1.5 and for SFOT is 1/1.4.
  • Figure 5 indicates that the SFOT model consistently improves the MLE model on the BLEU-F1 score.
  • Under similar Self-BLEU score, SFOT significantly improves the quality of LeakGAN (Guo et al, 2018), the best GAN by BLEU metric
Conclusion
  • The authors have introduced SFOT to mitigate exposure bias in text generation. The proposed model captures positional and contextual information of word tokens in OT matching.
  • Experiments on neural machine translation, text summarization, and text generation have demonstrated the effectiveness of the SFOT algorithm, yielding improved performance over strong baselines on these tasks
Tables
  • Table1: VI-EN and EN-VI translation BLEU scores
  • Table2: DE-EN and EN-DE translation BLEU scores
  • Table3: Comparison of German-to-Enlish translation examples. For each example, we show the human translation (reference) and the translation from MLE, TFOT, and SFOT. We highlight the key phrase differences between reference and translation outputs in blue and red, and annotate translation errors in bold. In the first example, SFOT correctly maintains all the information in “since winning in May election” by translating to “since his election victory in May”, whereras MLE only generates “in May” and TFOT also misses “winning” in the reference. In the second example, SFOT successfully keeps the information “Beijing”, whereas MLE generates wrong words “expiration of” and TFOT changes “Beijing” to “government”
  • Table4: BLEU scores for VI-EN and EN-VI ablation study
  • Table5: Results of text summarization on English Gigawords dataset
  • Table6: Human evaluation of NLG on EMNLP news 2017 dataset. 100 generated sentences from each model are rated 1-5, with means and standard deviations reported. Real sentences were rated 4.21 ± 0.44
  • Table7: Examples generated by SFOT in NLG experiments
Related work
Study subjects and analysis
native speakers: 10
To reasonably select the best model along the temperature sweep, we are motivated by (Gu et al, 2019) and propose the BLEU-F1 score to evaluate model. Ten native speakers are asked to rate each sentence in the scale 1 to 5 in terms of readability and meaningfulness the trade-off between the quality and diversity simultaneously, defined as BLEU-F1

2 × BLEU × (1-Self-BLEU) BLEU + (1-Self-BLEU) (12)

Figure 5 shows the BLEU-F1 score versus reverse temperature on MLE and SFOT
. We observed that the best temperature for MLE model is 1/1.5 and for SFOT is 1/1.4

native speakers: 10
To reasonably select the best model along the temperature sweep, we are motivated by (Gu et al, 2019) and propose the BLEU-F1 score to evaluate model. Ten native speakers are asked to rate each sentence in the scale 1 to 5 in terms of readability and meaningfulness the trade-off between the quality and diversity simultaneously, defined as. 2 × BLEU × (1-Self-BLEU) BLEU + (1-Self-BLEU)

Reference
  • Fritz Albregtsen et al. 2008. Statistical texture measures computed from gray level coocurrence matrices. Image processing laboratory, department of informatics, university of oslo, 5.
    Google ScholarFindings
  • David Alvarez-Melis and Tommi S Jaakkola. 2018. Gromov-wasserstein alignment of word embedding spaces. In EMNLP.
    Google ScholarFindings
  • Martin Arjovsky, Soumith Chintala, and Leon Bottou. 2017. Wasserstein generative adversarial networks. In ICML.
    Google ScholarFindings
  • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. An actor-critic algorithm for sequence prediction. In ICLR.
    Google ScholarFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 201Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS.
    Google ScholarFindings
  • Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. CONLL.
    Google ScholarLocate open access versionFindings
  • Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
    Findings
  • Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, and Laurent Charlin. 2018. Language gans falling short. arXiv preprint arXiv:1811.02549.
    Findings
  • Mauro Cettolo, Jan Niehues, Sebastian Stuker, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. 2015. The iwslt 2015 evaluation campaign. In IWSLT.
    Google ScholarFindings
  • Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, and Yoshua Bengio. 2017. Maximum-likelihood augmented discrete generative adversarial networks. In CoRR.
    Google ScholarFindings
  • Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, and Lawrence Carin. 2018. Adversarial text generation via featuremover’s distance. In NIPS.
    Google ScholarFindings
  • Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, and Lawrence Carin. 2019. Improving sequence-to-sequence learning via optimal transport. In ICLR.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Nan Ding and Radu Soricut. 2017. Cold-start reinforcement learning with softmax policy gradient. In Advances in Neural Information Processing Systems, pages 2817–2826.
    Google ScholarLocate open access versionFindings
  • Le Fang, Chunyuan Li, Jianfeng Gao, Wen Dong, and Changyou Chen. 2019. Implicit deep latent variable models for text generation. EMNLP.
    Google ScholarFindings
  • Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin, et al. 2019. Cyclical annealing schedule: A simple approach to mitigating KL vanishing. NAACL.
    Google ScholarLocate open access versionFindings
  • Aude Genevay, Gabriel Peyre, and Marco Cuturi. 20Learning generative models with sinkhorn divergences. AISTATS.
    Google ScholarLocate open access versionFindings
  • David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2003. English gigaword. Linguistic Data Consortium, Philadelphia, 4(1):34.
    Google ScholarLocate open access versionFindings
  • Alex Graves and Navdeep Jaitly. 2014. Towards endto-end speech recognition with recurrent neural networks. In ICML.
    Google ScholarFindings
  • Xiaodong Gu, Kyunghyun Cho, Jungwoo Ha, and Sunghun Kim. 2019. Dialogwae: Multimodal response generation with conditional wasserstein auto-encoder. ICLR.
    Google ScholarLocate open access versionFindings
  • Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Long text generation via adversarial training with leaked information. In AAAI.
    Google ScholarFindings
  • Kelvin Guu, Tatsunori B Hashimoto, Yonatan Oren, and Percy Liang. 2018. Generating sentences by editing prototypes. TACL.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation.
    Google ScholarFindings
  • Ferenc Huszar. 2015. How (not) to train your generative model: Scheduled sampling, likelihood, adversary? In CoRR.
    Google ScholarLocate open access versionFindings
  • Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M. Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In ACL.
    Google ScholarFindings
  • Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In ICML.
    Google ScholarFindings
  • Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In NIPS.
    Google ScholarFindings
  • Chunyuan Li, Xiang Gao, Yuan Li, Xiujun Li, Baolin Peng, Yizhe Zhang, and Jianfeng Gao. 2020a. Optimus: Organizing sentences via pre-trained modeling of a latent space. arXiv preprint arXiv:2004.04092.
    Findings
  • Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming-Ting Sun, and Bill Dolan. 2020b. Contextualized perturbation for textual adversarial attack. arXiv preprint arXiv:2009.07502.
    Findings
  • Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter, and Dan Jurafsky. 2017. Adversarial learning for neural dialogue generation. In EMNLP.
    Google ScholarFindings
  • Kevin J Liang, Chunyuan Li, Guoyin Wang, and Lawrence Carin. 2018. Generative adversarial network training is a continual learning problem. arXiv preprint arXiv:1811.11083.
    Findings
  • Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.
    Google ScholarFindings
  • Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. 2017. Adversarial ranking for language generation. In NIPS.
    Google ScholarFindings
  • Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, and Qiang Liu. 2018. Action-depedent control variates for policy optimization via stein’s identity. In ICLR.
    Google ScholarFindings
  • Giulia Luise, Alessandro Rudi, Massimiliano Pontil, and Carlo Ciliberto. 2018. Differential properties of sinkhorn approximation for learning with wasserstein distance. arXiv:1805.11897.
    Findings
  • Minh-Thang Luong, Eugene Brevdo, and Rui Zhao. 2017. Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt.
    Findings
  • Minh-Thang Luong and Christopher D Manning. 2015. Stanford neural machine translation systems for spoken language domains. In IWSLT.
    Google ScholarFindings
  • Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015a. Effective approaches to attentionbased neural machine translation. In EMNLP.
    Google ScholarFindings
  • Minh-Thang Luong, Ilya Sutskever, Quoc V Le, Oriol Vinyals, and Wojciech Zaremba. 2015b. Addressing the rare word problem in neural machine translation. In ACL.
    Google ScholarFindings
  • Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In ISCA.
    Google ScholarFindings
  • Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. Ms marco: A human generated machine reading comprehension dataset. In NIPS.
    Google ScholarFindings
  • Mohammad Norouzi, Samy Bengio, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans, et al. 2016. Reward augmented maximum likelihood for neural structured prediction. In Advances In Neural Information Processing Systems, pages 1723–1731.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL.
    Google ScholarFindings
  • Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In NAACL.
    Google ScholarFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.
    Findings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog.
    Google ScholarFindings
  • Prajit Ramachandran, Peter J Liu, and Quoc V Le. 2017. Unsupervised pretraining for sequence to sequence learning. In EMNLP.
    Google ScholarFindings
  • Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016a. Sequence level training with recurrent neural networks. CoRR.
    Google ScholarLocate open access versionFindings
  • Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016b. Sequence level training with recurrent neural networks. In ICLR.
    Google ScholarFindings
  • Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In CVPR.
    Google ScholarFindings
  • Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685.
    Findings
  • Hiroaki Sakoe, Seibi Chiba, A Waibel, and KF Lee. 1990. Dynamic programming algorithm optimization for spoken word recognition. Readings in speech recognition.
    Google ScholarFindings
  • Ruslan Salakhutdinov. 2015. Learning deep generative models. Annual Review of Statistics and Its Application, 2:361–385.
    Google ScholarLocate open access versionFindings
  • Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointergenerator networks. In ACL.
    Google ScholarFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. In ACL.
    Google ScholarFindings
  • Chenze Shao, Yang Feng, and Xilin Chen. 2018. Greedy search with probabilistic n-gram matching for neural machine translation. arXiv preprint arXiv:1809.03132.
    Findings
  • Bing Su, Xiaoqing Ding, Changsong Liu, and Ying Wu. 2015. Heteroscedastic max-min distance analysis. In CVPR.
    Google ScholarFindings
  • Bing Su and Gang Hua. 2018. Order-preserving optimal transport for distances between sequences. IEEE transactions on pattern analysis and machine intelligence.
    Google ScholarFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In NIPS.
    Google ScholarFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
    Google ScholarFindings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. In ICML workshop.
    Google ScholarLocate open access versionFindings
  • Xin Wang, Wenhu Chen, Yuan-Fang Wang, and William Yang Wang. 2018. No metrics are perfect: Adversarial reward learning for visual storytelling. In ACL.
    Google ScholarFindings
  • Ronald J Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270– 280.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Yujia Xie, Xiangfeng Wang, Ruijia Wang, and Hongyuan Zha. 2018. A fast proximal point method for Wasserstein distance. In arXiv:1802.04307.
    Findings
  • Qian Yang, Dinghan Shen, Yong Cheng, Wenlin Wang, Guoyin Wang, Lawrence Carin, et al. 2019. An endto-end generative architecture for paraphrase generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3123–3133.
    Google ScholarLocate open access versionFindings
  • Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR.
    Google ScholarFindings
  • Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI.
    Google ScholarFindings
  • Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329.
    Findings
  • Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Liqun Chen, Dinghan Shen, Guoyin Wang, and Lawrence Carin. 2018. Sequence generation with guider network. arXiv preprint arXiv:1811.00696.
    Findings
  • Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. 2019. Bridging the gap between training and inference for neural machine translation. arXiv preprint arXiv:1906.02448.
    Findings
  • Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin. 2017. Adversarial feature matching for text generation. In ICML.
    Google ScholarFindings
  • Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texygen: a benchmarking platform for text generation models. In SIGIR.
    Google ScholarFindings
  • Dataset Two standard datasets are tested for NMT tasks. The first one is a small-scale EnglishVietnamese corpus from the IWSLT 2015 Evaluation Campaign (Cettolo et al., 2015), which is a parallel corpus of TED-talks and contains 133K sentence pairs. We follow the pre-processing procedure in (Luong and Manning, 2015) by replacing words with frequencies less than 5 with unk. As a result, our vocabulary reduces to 17K for English and 7.7K for Vietnamese. We use TED tst2012 as development set and TED tst2013 as the test set. For a large-scale dataset, we select an
    Google ScholarFindings
  • English-German corpus from the WMT16 Evaluate Campaign5, which contains 4.5M sentence pairs. Newstest 2013 is used as the development set and Newstest 2015 is used as the test set. We conduct the sub-word tokenization on the corpus using the Byte Pair Encoding (BPE) method (Sennrich et al., 2015). Following Klein et al. (2017), we set the vocabulary size of both English and German to 32K.
    Google ScholarLocate open access versionFindings
  • Setup We use Google’s Neural Machine Translation (GNMT) system (Wu et al., 2016) as our baseline MLE model, which follows the standard architecture and hyper-parameters6 for fair comparison. All other models are built on top of with same network structure. We evaluate the model performance using BLEU scores (Papineni et al., 2002). We set OT weighting parameter λ = 0.1 and order-preserving penalty weighting parameter β = 0.1.
    Google ScholarLocate open access versionFindings
  • For English-Vietnamese translation tasks (i.e., EN-VI or VI-EN), we follow the setup in (Sutskever et al., 2014; Luong et al., 2015b,a). We use one bidirectional LSTM layer with 512 hidden units as encoder and two-layer LSTM with 512 hidden units at each layer as decoder. The embedding dimension is set as 512. We follow the attention method described in (Luong et al., 2015a) and use dropout with probability 0.2 as suggested by (Zaremba et al., 2014). All parameters are initialized uniformly between [−0.1, 0.1]. We train the model for 12 epochs with 12 epochs using Stochastic Gradient Decent (SGD). For the first 8 epochs, we set learning rate as 1.0. After that, we anneal the learning rate at half at every epoch.
    Google ScholarLocate open access versionFindings
  • For English-German translation tasks (i.e., ENGE or GE-EN), we adopt a stacked LSTM with a 2-layer bidirectional of 1024 units as encoder and 4-layer LSTM with units 1024 as decoder. The embedding dimension is set to 1024. We adopt the attention used in (Wu et al., 2016). We train the model for 10 epochs. For the first 5 epochs, we set the learning rate as 1 and then halving the learning rate every half epoch.
    Google ScholarLocate open access versionFindings
  • We use a widely accepted English Gigawords corpus (Graff et al., 2003) for the text summariza-
    Google ScholarLocate open access versionFindings
  • 5http://statmt.org/wmt16 6https://github.com/tensorflow/nmt tion task. We follow the pre-process in (Rush et al., 2015). The dataset is sampled and split into train/dev/test set with size 200K/8K/2K.
    Google ScholarLocate open access versionFindings
  • The government’s decision to extend its coal policy vote will be announced in the first half of 2017.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科