Adversarial Ranking for Language Generation

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017.

Cited by: 212|Views77
EI
Weibo:
We presented a new generative adversarial network, RankGAN, for generating high-quality natural language descriptions

Abstract:

Generative adversarial networks (GANs) have great successes on synthesizing data. However, the existing GANs restrict the discriminator to be a binary classifier, and thus limit their learning capacity for tasks that need to synthesize output with rich structures such as natural language descriptions. In this paper, we propose a novel gen...More

Code:

Data:

0
Introduction
  • Language generation plays an important role in natural language processing, which is essential to many applications such as machine translation [1], image captioning [6], and dialogue systems [26].
  • Recent studies [10, 11, 29, 33] show that the recurrent neural networks (RNNs) and the long shortterm memory networks (LSTMs) can achieve impressive performances for the task of language generation.
  • GANs achieve great performance in computer vision tasks such as image synthesis [5, 14, 17, 24, 27]
  • Their successes are mainly attributed to training the discriminator to estimate the statistical properties of the continuous real-valued data
Highlights
  • Language generation plays an important role in natural language processing, which is essential to many applications such as machine translation [1], image captioning [6], and dialogue systems [26]
  • It’s noteworthy that while the policy gradient with BLEU grasps the similarities depend on the n-grams matching from the token-level among sentences, RankGAN explores the ranking connections inside the embedded features of sentences
  • Since the proposed RankGAN focuses on unconditional Generative adversarial networks that do not consider any prior knowledge as input, we train our model on the captions of the training set without conditioning on specific images
  • We presented a new generative adversarial network, RankGAN, for generating high-quality natural language descriptions
  • By relaxing the binary-classification restriction and conceiving a relative space with rich information for the discriminator in the adversarial learning framework, the proposed learning objective is favourable for synthesizing natural language sentences in high quality
Methods
  • 3.1 Overall architecture

    In conventional GANs [8], the discriminator with multilayer perceptrons outputs a binary probability distribution to suggest whether the unknown sequences come from the real data rather than the data synthesized by a generator.
  • It’s noteworthy that while the PG-BLEU grasps the similarities depend on the n-grams matching from the token-level among sentences, RankGAN explores the ranking connections inside the embedded features of sentences
  • These two methods are fundamentally different.
  • While MLE, PG-BLEU and SeqGAN tend to converge after 200 training epochs, the proposed RankGAN consistently improves the language generator and achieves relatively lower NLL score.
  • Learning with the large reference size and comparison set could potentially increase the computational cost
Results
  • Following the evaluation protocol in [35], the authors first carry out experiments on the data and simulator proposed in [35].
  • It can be seen that the proposed RankGAN performs more favourably compared to the state-of-the-art methods in terms of BLEU-2 score
  • This indicates that the proposed objective is able to learn effective language generator with real-world data.
  • The authors investigate the possibility of learning Shakespeare’s lexical dependency, and make use of the rare phrases
  • In this experiment, the authors train the model on the Romeo and Juliet play [28] to further validate the proposed method.
  • The results indicate the proposed RankGAN is able to capture the transition pattern among the words, even if the training sentences are novel, delicate and complicated
Conclusion
  • Note that the proposed RankGAN has a Nash Equilibrium when the generator Gθ simulates the humanwritten sentences distribution Ph, and the ranker Rφ cannot correctly estimate rank between the synthetic sentences and the human-written sentences.
  • In the following experiment section, the authors observe that the training converges on four different datasets, and leads to a better performance compared to previous state-of-the-arts.The authors presented a new generative adversarial network, RankGAN, for generating high-quality natural language descriptions.
  • Instead of training the discriminator to assign absolute binary predicate to real or synthesized data samples, the authors propose using a ranker to rank the human-written sentences higher than the machine-written sentences relatively.
  • The authors plan to explore and extend the model in many other tasks, such as image synthesis and conditional GAN for image captioning
Summary
  • Introduction:

    Language generation plays an important role in natural language processing, which is essential to many applications such as machine translation [1], image captioning [6], and dialogue systems [26].
  • Recent studies [10, 11, 29, 33] show that the recurrent neural networks (RNNs) and the long shortterm memory networks (LSTMs) can achieve impressive performances for the task of language generation.
  • GANs achieve great performance in computer vision tasks such as image synthesis [5, 14, 17, 24, 27]
  • Their successes are mainly attributed to training the discriminator to estimate the statistical properties of the continuous real-valued data
  • Methods:

    3.1 Overall architecture

    In conventional GANs [8], the discriminator with multilayer perceptrons outputs a binary probability distribution to suggest whether the unknown sequences come from the real data rather than the data synthesized by a generator.
  • It’s noteworthy that while the PG-BLEU grasps the similarities depend on the n-grams matching from the token-level among sentences, RankGAN explores the ranking connections inside the embedded features of sentences
  • These two methods are fundamentally different.
  • While MLE, PG-BLEU and SeqGAN tend to converge after 200 training epochs, the proposed RankGAN consistently improves the language generator and achieves relatively lower NLL score.
  • Learning with the large reference size and comparison set could potentially increase the computational cost
  • Results:

    Following the evaluation protocol in [35], the authors first carry out experiments on the data and simulator proposed in [35].
  • It can be seen that the proposed RankGAN performs more favourably compared to the state-of-the-art methods in terms of BLEU-2 score
  • This indicates that the proposed objective is able to learn effective language generator with real-world data.
  • The authors investigate the possibility of learning Shakespeare’s lexical dependency, and make use of the rare phrases
  • In this experiment, the authors train the model on the Romeo and Juliet play [28] to further validate the proposed method.
  • The results indicate the proposed RankGAN is able to capture the transition pattern among the words, even if the training sentences are novel, delicate and complicated
  • Conclusion:

    Note that the proposed RankGAN has a Nash Equilibrium when the generator Gθ simulates the humanwritten sentences distribution Ph, and the ranker Rφ cannot correctly estimate rank between the synthetic sentences and the human-written sentences.
  • In the following experiment section, the authors observe that the training converges on four different datasets, and leads to a better performance compared to previous state-of-the-arts.The authors presented a new generative adversarial network, RankGAN, for generating high-quality natural language descriptions.
  • Instead of training the discriminator to assign absolute binary predicate to real or synthesized data samples, the authors propose using a ranker to rank the human-written sentences higher than the machine-written sentences relatively.
  • The authors plan to explore and extend the model in many other tasks, such as image synthesis and conditional GAN for image captioning
Tables
  • Table1: The performance comparison of different methods on the synthetic data [<a class="ref-link" id="c35" href="#r35">35</a>] in terms of the negative log-likelihood (NLL) scores
  • Table2: The performance comparison of different methods on the Chinese poem generation in terms of the BLEU scores and human evaluation scores
  • Table3: The performance comparison of different methods on the COCO captions in terms of the BLEU scores and human evaluation scores
  • Table4: Example of the generated descriptions with different methods. Note that the language models are trained on COCO caption dataset without the images
  • Table5: The performance comparison of different methods on Shakespeare’s play Romeo and Juliet in terms of the BLEU scores
Download tables as Excel
Related work
  • GANs: Recently, GANs [8] have been widely explored due to its nature of unsupervised deep learning. Though GANs achieve great successes on computer vision applications [5, 14, 17, 24, 27], there are only a few progresses in natural language processing because the discrete sequences are not differentiable. To tackle the non-differentiable problem, SeqGAN [35] addresses this issue by the policy gradient inspired from the reinforcement learning [31]. The approach considers each word selection in the sentence as an action, and computes the reward of the sequence with the Monte Carlo (MC) search. Their method back-propagates the reward from the discriminator, and encourages the generator to create human-like language sentences. Li et al [18] apply GANs with the policy gradient method to dialogue generation. They train a Seq2Seq model as the generator, and build the discriminator using a hierarchical encoder followed by a 2-way softmax function. Dai et al [4] show that it is possible to enhance the diversity of the generated image captions with conditional GANs. Yang et al [34] further prove that training a convolutional neural network (CNN) as a discriminator yields better performance than that of the recurrent neural network (RNN) for the task of machine translation (MT). Among the works mentioned above, SeqGAN [35] is the most relevant study to our proposed method. The major difference between SeqGAN [35] and our proposed model is that we replace the regression based discriminator with a novel ranker, and we formulate a new learning objective function in the adversarial learning framework. In this condition, the rewards for training our model are not limited to binary regression, but encoded with relative ranking information.
Funding
  • Experimental results clearly demonstrate that our proposed method outperforms the state-of-the-art methods
Study subjects and analysis
participants: 57
We further conduct human study to evaluate the quality of the generated poem in human perspective. Specifically, we invite 57 participants who are native mandarin Chinese speakers to score the poems. During the evaluation, we randomly sample and show 15 poems written by different methods, Table 3: The performance comparison of different methods on the COCO captions in terms of the BLEU scores and human evaluation scores

men: 2
During the evaluation, we randomly sample and show 15 poems written by different methods, Table 3: The performance comparison of different methods on the COCO captions in terms of the BLEU scores and human evaluation scores. Two men happily working on a plastic computer. The toilet in the bathroom is filled with a bunch of ice. A bottle of wine near stacks of dishes and food

people: 3
Child jumped next to each other. Three people standing in front of some kind of boats. A bedroom has silver photograph desk

participants: 28
We also conduct human study to evaluate the quality of the generated sentences. We invite 28 participants who are native or proficient English speakers to grade the sentences. Similar to the setting in previous section, we randomly sample and show 15 sentences written by different methods, and ask the subjects to grade from 1 to 10 points

Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
    Findings
  • Satanjeev Banerjee and Alon Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proc. ACL workshops, volume 29, pages 65–72, 2005.
    Google ScholarLocate open access versionFindings
  • Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. Proc. CoNLL, page 10, 2016.
    Google ScholarLocate open access versionFindings
  • Bo Dai, Dahua Lin, Raquel Urtasun, and Sanja Fidler. Towards diverse and natural image descriptions via a conditional gan. arXiv preprint arXiv:1703.06029, 2017.
    Findings
  • Emily L Denton, Soumith Chintala, Rob Fergus, et al. Deep generative image models using a laplacian pyramid of adversarial networks. In Proc. NIPS, pages 1486–1494, 2015.
    Google ScholarLocate open access versionFindings
  • Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C Platt, et al. From captions to visual concepts and back. In Proc. CVPR, pages 1473–1482, 2015.
    Google ScholarLocate open access versionFindings
  • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122, 2017.
    Findings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. NIPS, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow. On distinguishability criteria for estimating generative models. arXiv preprint arXiv:1412.6515, 2014.
    Findings
  • Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
    Findings
  • Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. In Proc. CIKM, pages 2333–2338, 2013.
    Google ScholarLocate open access versionFindings
  • Ferenc Huszár. How (not) to train your generative model: Scheduled sampling, likelihood, adversary? arXiv preprint arXiv:1511.05101, 2015.
    Findings
  • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proc. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Thorsten Joachims. Optimizing search engines using clickthrough data. In Proc. SIGKDD, pages 133–142, 2002.
    Google ScholarLocate open access versionFindings
  • Matt J Kusner and José Miguel Hernández-Lobato. Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv preprint arXiv:1611.04051, 2016.
    Findings
  • Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image superresolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016.
    Findings
  • Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547, 2017.
    Findings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proc. ECCV, pages 740–755, 2014.
    Google ScholarLocate open access versionFindings
  • Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, and Kevin Murphy. Improved image captioning via policy gradient optimization of spider.
    Google ScholarFindings
  • Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends R in Information Retrieval, 3(3):225–331, 2009.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proc. ACL, pages 311–318, 2002.
    Google ScholarLocate open access versionFindings
  • Devi Parikh and Kristen Grauman. Relative attributes. In Proc. ICCV, pages 503–510, 2011.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    Findings
  • Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text to image synthesis. In Proc. NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Kevin Reschke, Adam Vogel, and Dan Jurafsky. Generating recommendation dialogs by extracting information from user reviews. In ACL, 2013.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016.
    Findings
  • William Shakespeare. The complete works of William Shakespeare. Race Point Publishing, 2014.
    Google ScholarFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Proc. NIPS, pages 3104–3112, 2014.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
    Google ScholarFindings
  • Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, pages 1057–1063, 1999.
    Google ScholarLocate open access versionFindings
  • Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. Cider: Consensus-based image description evaluation. In Proc. CVPR, pages 4566–4575, 2015.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
    Findings
  • Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. Improving neural machine translation with conditional sequence generative adversarial nets. arXiv preprint arXiv:1703.04887, 2017.
    Findings
  • Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: sequence generative adversarial nets with policy gradient. In Proc. AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang and Yann LeCun. Text understanding from scratch. arXiv preprint arXiv:1502.01710, 2015.
    Findings
  • Xingxing Zhang and Mirella Lapata. Chinese poetry generation with recurrent neural networks. In Proc. EMNLP, 2014.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments