s few-shot capabilities. Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt. Intuitively, the in-context examples selected with such a strategy may serve as more informative inputs to unleash GPT-$3 s extensive knowledge. We evaluate the proposed approach on several natural language understanding and generation benchmarks, where the retrieval-based prompt selection approach consistently outperforms the random baseline. Moreover, it is observed that the sentence encoders fine-tuned on task-related datasets yield even more helpful retrieval results. Notably, significant gains are observed on tasks such as table-to-text generation (41.9% on the ToTTo dataset) and open-domain question answering (45.5% on the NQ dataset). We hope our investigation could help understand the behaviors of GPT-$3$ and large-scale pre-trained LMs in general and enhance their few-shot capabilities. ","authors":[{"id":"61772b9660a9653dc633bde3","name":"Jiachang Liu"},{"id":"561d7d0145cedb33980841c8","name":"Dinghan Shen"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"53f43f9cdabfaedd74ddb705","name":"Bill Dolan"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"},{"id":"53f44b16dabfaedf435df98d","name":"Weizhu Chen"}],"id":"6006bb4891e0111a1b6a2346","num_citation":0,"order":2,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2101\u002F2101.06804.pdf","title":"What Makes Good In-Context Examples for GPT-$3$?","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.06804"],"versions":[{"id":"6006bb4891e0111a1b6a2346","sid":"2101.06804","src":"arxiv","year":2021}],"year":2021},{"abstract":"Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process, often resulting in uninteresting responses. Attempts to boost informativeness alone come at the expense of factual accuracy, as attested by pretrained language models' propensity to \"hallucinate\" facts. While this may be mitigated by access to background knowledge, there is scant guarantee of relevance and informativeness in generated responses. We propose a framework that we call controllable grounded response generation (CGRG), in which lexical control phrases are either provided by a user or automatically extracted by a control phrase predictor from dialogue context and grounding knowledge. Quantitative and qualitative results show that, using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.","authors":[{"id":"5629169e45cedb339882a738","name":"Wu Zeqiu"},{"id":"53f328e7dabfae9a8448179a","name":"Galley Michel"},{"id":"53f4804edabfae963d2596a1","name":"Brockett Chris"},{"id":"562f456b45cedb33995dbe96","name":"Zhang Yizhe"},{"id":"53f4397adabfaefedbae5b5a","name":"Gao Xiang"},{"id":"53f4a061dabfaec18c77b735","name":"Quirk Chris"},{"id":"562ce83845cedb3398cfacb8","name":"Koncel-Kedziorski Rik"},{"id":"53f428e8dabfaec22b9e1c5d","name":"Gao Jianfeng"},{"id":"53f36800dabfae4b349a10d9","name":"Hajishirzi Hannaneh"},{"id":"5434c27bdabfaebba5862524","name":"Ostendorf Mari"},{"id":"53f43f9cdabfaedd74ddb705","name":"Dolan Bill"}],"flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5eb78919da5629cf244303ae","num_citation":19,"order":3,"pages":{"end":"14093","start":"14085"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F1791\u002F269\u002F1430\u002F5eb78919da5629cf244303ae_0.pdf","title":"A Controllable Model Of Grounded Response Generation","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.00613","http:\u002F\u002Fwww.webofknowledge.com\u002F"],"venue":{"info":{"name":"THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE"},"volume":"35"},"versions":[{"id":"5eb78919da5629cf244303ae","sid":"2005.00613","src":"arxiv","year":2020},{"id":"619ba6861c45e57ce99e8ec6","sid":"WOS:000681269805086","src":"wos","vsid":"THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE","year":2021}],"year":2021},{"abstract":" Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of natural languages. In this paper, we propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, which are only applied to specific models, our method is more general and could be easily adapted to any MLE-based training procedure. In addition, our framework allows task-specific evaluation metrics to be designed to flexibly control the generated sentences, for example, in terms of controlling vocabulary usage and avoiding nontrivial repetitions. Extensive experimental results demonstrate the superiority of our method on two synthetic and several standard real datasets, significantly improving related baselines. ","authors":[{"name":"Ping Yu"},{"id":"5448b943dabfae1e04137212","name":"Ruiyi Zhang"},{"id":"5408391ddabfae8faa633ec9","name":"Yang Zhao"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"53f42d9adabfaee1c0a36753","name":"Chunyuan Li"},{"id":"542dbb37dabfae489b98a547","name":"Changyou Chen"}],"id":"5ffd692091e01106b3240e74","num_citation":0,"order":3,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2101\u002F2101.03236.pdf","title":"SDA: Improving Text Generation with Self Data Augmentation","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.03236"],"versions":[{"id":"5ffd692091e01106b3240e74","sid":"2101.03236","src":"arxiv","year":2021}],"year":2021},{"abstract":" Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in a context-aware manner. We propose three contextualized perturbations, Replace, Insert and Merge, allowing for generating outputs of varied lengths. With a richer range of available strategies, CLARE is able to attack a victim model more efficiently with fewer edits. Extensive experiments and human evaluation demonstrate that CLARE outperforms the baselines in terms of attack success rate, textual similarity, fluency and grammaticality. ","authors":[{"id":"54590eb4dabfaeb0fe2d049a","name":"Dianqi Li"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"name":"Hao Peng"},{"id":"54055501dabfae8faa5c3a71","name":"Liqun Chen"},{"id":"53f4804edabfae963d2596a1","name":"Chris Brockett"},{"id":"53f5687cdabfae65cff804a4","name":"Ming-Ting Sun"},{"id":"53f43f9cdabfaedd74ddb705","name":"Bill Dolan"}],"flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5f632df091e011242e3f2b42","num_citation":4,"order":1,"pages":{"end":"5069","start":"5053"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2009\u002F2009.07502.pdf","title":"Contextualized Perturbation for Textual Adversarial Attack","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.07502","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Fnaacl\u002FLiZPCBSD21","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2021.naacl-main.400\u002F"],"venue":{"info":{"name":"NAACL-HLT"}},"versions":[{"id":"5f632df091e011242e3f2b42","sid":"2009.07502","src":"arxiv","year":2020},{"id":"60bf411491e0110bd0f6c9cc","sid":"conf\u002Fnaacl\u002FLiZPCBSD21","src":"dblp","vsid":"conf\u002Fnaacl","year":2021}],"year":2021},{"abstract":" We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems. ","authors":[{"id":"562f456b45cedb33995dbe96","name":"Zhang Yizhe"},{"id":"562c845e45cedb3398c4a8db","name":"Sun Siqi"},{"id":"53f328e7dabfae9a8448179a","name":"Galley Michel"},{"id":"560ccb3945ce1e59609a7f1b","name":"Chen Yen-Chun"},{"id":"53f4804edabfae963d2596a1","name":"Brockett Chris"},{"id":"53f4397adabfaefedbae5b5a","name":"Gao Xiang"},{"id":"53f428e8dabfaec22b9e1c5d","name":"Gao Jianfeng"},{"id":"5631d31345cedb3399f2c18f","name":"Liu Jingjing"},{"id":"53f43f9cdabfaedd74ddb705","name":"Dolan Bill"}],"doi":"10.18653\u002FV1\u002F2020.ACL-DEMOS.30","flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5dc149983a55acb75f3913be","num_citation":360,"order":0,"pages":{"end":"278","start":"270"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F1366\u002F124\u002F597\u002F5dc149983a55acb75f3913be_0.pdf","title":"DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Facl\u002FZhangSGCBGGLD20","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-demos.30\u002F","https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.00536.pdf","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr1911.html#abs-1911-00536","https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpublication\u002Fdialogpt-large-scale-generative-pre-training-for-conversational-response-generation\u002F","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-demos.30.pdf","https:\u002F\u002Fwww.arxiv-vanity.com\u002Fpapers\u002F1911.00536\u002F"],"venue":{"info":{"name":"ACL"}},"versions":[{"id":"5dc149983a55acb75f3913be","sid":"1911.00536","src":"arxiv","year":2019},{"id":"5ef876eb91e0115941835e03","sid":"conf\u002Facl\u002FZhangSGCBGGLD20","src":"dblp","vsid":"conf\u002Facl","year":2020},{"id":"60741a5ae4510cd7c86d5a3f","sid":"2988937804","src":"mag","vsid":"1188739475","year":2020}],"year":2020},{"authors":[{"name":"Xinnuo Xu"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"53f31899dabfae9a84426c5c","name":"Lars Liden"},{"id":"53f4406cdabfaeb22f4ae103","name":"Sungjin Lee"}],"doi":"10.21437\u002FInterspeech.2020-1341","id":"5ff881a891e011c83266d240","num_citation":0,"order":1,"pages":{"end":"3924","start":"3920"},"title":"Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task.","urls":["https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Finterspeech\u002FXuZLL20","https:\u002F\u002Fdoi.org\u002F10.21437\u002FInterspeech.2020-1341","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fconf\u002Finterspeech\u002Finterspeech2020.html#XuZLL20"],"venue":{"info":{"name":"INTERSPEECH"}},"versions":[{"id":"5ff881a891e011c83266d240","sid":"conf\u002Finterspeech\u002FXuZLL20","src":"dblp","vsid":"conf\u002Finterspeech","year":2020},{"id":"5ff68ae8d4150a363ccdd762","sid":"3097568833","src":"mag","vsid":"1177287137","year":2020}],"year":2020},{"abstract":"Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation. However, these models cannot be directly employed to generate text under specified lexical constraints. To address this challenge, we present POINTER (PrOgressive INsertion-based TransformER), a simple yet novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. This procedure is recursively applied until a sequence is completed. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable. We pre-train our model with the proposed progressive insertion-based objective on a 12GB Wikipedia dataset, and fine-tune it on downstream hard-constrained generation tasks. Non-autoregressive decoding yields a logarithmic time complexity during inference time. Experimental results on both News and Yelp datasets demonstrate that Pointer achieves state-of-the-art performance on constrained text generation. We released the pre-trained models and the source code to facilitate future research.","authors":[{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"562d0a9545cedb3398d38079","name":"Guoyin Wang"},{"id":"53f42d9adabfaee1c0a36753","name":"Chunyuan Li"},{"id":"5622795d45cedb33983cc300","name":"Zhe Gan"},{"id":"53f4804edabfae963d2596a1","name":"Chris Brockett"},{"id":"53f43f9cdabfaedd74ddb705","name":"Bill Dolan"}],"doi":"10.18653\u002FV1\u002F2020.EMNLP-MAIN.698","id":"5ff68ba2d4150a363ccfe88d","num_citation":0,"order":0,"pages":{"end":"8670","start":"8649"},"title":"POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training.","urls":["https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.698\u002F","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fconf\u002Femnlp\u002Femnlp2020-1.html#ZhangWLGBD20"],"venue":{"info":{"name":"empirical methods in natural language processing"}},"versions":[{"id":"5ff68ba2d4150a363ccfe88d","sid":"3099872554","src":"mag","vsid":"1192655580","year":2020}],"year":2020},{"abstract":"Existing open-domain dialog models are generally trained to minimize the perplexity of target human responses. However, some human replies are more engaging than others, spawning more followup interactions. Current conversational models are increasingly capable of producing turns that are context-relevant, but in order to produce compelling agents, these models need to be able to predict and optimize for turns that are genuinely engaging. We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. To alleviate possible distortion between the feedback and engagingness, we convert the ranking problem to a comparison of response pairs which involve few confounding factors. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data and the resulting ranker outperformed several baselines. Particularly, our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback. We finally combine the feedback prediction models and a human-like scoring model to rank the machine-generated dialog responses. Crowd-sourced human evaluation shows that our ranking method correlates better with real human preferences than baseline models.","authors":[{"id":"53f4397adabfaefedbae5b5a","name":"Xiang Gao"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"53f328e7dabfae9a8448179a","name":"Michel Galley"},{"id":"53f4804edabfae963d2596a1","name":"Chris Brockett"},{"id":"53f43f9cdabfaedd74ddb705","name":"Bill Dolan"}],"doi":"10.18653\u002FV1\u002F2020.EMNLP-MAIN.28","flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5f61e3aa91e011fae8fd6a87","num_citation":2,"order":1,"pages":{"end":"395","start":"386"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F1619\u002F393\u002F1653\u002F5f61e3aa91e011fae8fd6a87_0.pdf","title":"Dialogue Response Ranking Training with Large Scale Human Feedback Data","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.06978","https:\u002F\u002F2020.emnlp.org\u002Fpapers\u002Fmain","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2009.html#abs-2009-06978","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.28\u002F","https:\u002F\u002Fwww.arxiv-vanity.com\u002Fpapers\u002F2009.06978\u002F"],"venue":{"info":{"name":"EMNLP 2020"}},"versions":[{"id":"5f61e3aa91e011fae8fd6a87","sid":"2009.06978","src":"arxiv","year":2020},{"id":"5f7fe6d80205f07f689731b1","sid":"emnlp2020#153","src":"conf_emnlp","year":2020},{"id":"5ff68b25d4150a363cce7552","sid":"3098258760","src":"mag","vsid":"1192655580","year":2020}],"year":2020},{"abstract":" Cross-domain alignment between image objects and text sequences is key to many visual-language tasks, and it poses a fundamental challenge to both computer vision and natural language processing. This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities, under a weakly-supervised setup, improving performance over state-of-the-art solutions. Our method builds upon recent advances in optimal transport (OT) to resolve the cross-domain matching problem in a principled manner. Formulated as a drop-in regularizer, the proposed OT solution can be efficiently computed and used in combination with other existing approaches. We present empirical evidence to demonstrate the effectiveness of our approach, showing how it enables simpler model architectures to outperform or be comparable with more sophisticated designs on a range of vision-language tasks. ","authors":[{"id":"618189968672f196d679c02a","name":"Siyang Yuan"},{"name":"Ke Bai"},{"id":"54055501dabfae8faa5c3a71","name":"Liqun Chen"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"542bfaafdabfae1ad8958360","name":"Chenyang Tao"},{"id":"53f42d9adabfaee1c0a36753","name":"Chunyuan Li"},{"id":"542a5574dabfae646d551a73","name":"Guoyin Wang"},{"name":"Ricardo Henao"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"id":"5f3b9d0191e0110589e8996e","num_citation":0,"order":3,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2008\u002F2008.06597.pdf","title":"Weakly supervised cross-domain alignment with optimal transport","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.06597"],"versions":[{"id":"5f3b9d0191e0110589e8996e","sid":"2008.06597","src":"arxiv","year":2020}],"year":2020},{"abstract":"Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines. ","authors":[{"id":"5612e14645cedb3397975541","name":"Xinjie Fan"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"562daeca45cedb3398eb4fcc","name":"Zhendong Wang"},{"id":"53f46d5fdabfaeecd6a27d91","name":"Mingyuan Zhou"}],"doi":"","flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5e5e18ec93d709897ce3d098","num_citation":2,"order":1,"pages":{"end":"","start":""},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002Fprogram\u002F5e5e18ec93d709897ce3d098_0.pdf","title":"Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation","urls":["https:\u002F\u002Fopenreview.net\u002Fforum?id=r1lOgyrKDS","https:\u002F\u002Fopenreview.net\u002Fpdf?id=r1lOgyrKDS","http:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.13151.pdf","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Ficlr\u002FFanZWZ20","https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.13151"],"venue":{"info":{"name":"ICLR"},"issue":"","volume":""},"versions":[{"id":"5e5e18ec93d709897ce3d098","sid":"2996612683","src":"mag","vsid":"2584161585","year":2020},{"id":"5eb52f5791e01138ffc2fc6f","sid":"conf\u002Ficlr\u002FFanZWZ20","src":"dblp","vsid":"conf\u002Ficlr","year":2020},{"id":"5e0c6dcc3a55acc9707f3955","sid":"1912.13151","src":"arxiv","year":2019}],"year":2020},{"abstract":"In this work, we focus on the contextual document ranking task, which deals with the challenge of user interaction modeling for conversational search. Given a history of user feedback behaviors, such as issuing a query, clicking a document, and skipping a document, we propose to introduce behavior awareness to a neural ranker, resulting in a Hierarchical Behavior Aware Transformers (HBA-Transformers) model. The hierarchy is composed of an intra-behavior attention layer and an inter-behavior attention layer to let the system effectively distinguish and model different user behaviors. Our extensive experiments on the AOL session dataset demonstrate that the hierarchical behavior aware architecture is more powerful than a simple combination of history behaviors. Besides, we analyze the conversational property of queries. We show that coherent sessions tend to be more conversational and thus are more demanding in terms of considering history user behaviors.\n\n","authors":[{"id":"5d43d51d7390bff0db5ff16b","name":"Chen Qu"},{"id":"53f43024dabfaec09f1374fc","name":"Chenyan Xiong"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"5d43d5307390bff0db5ffd26","name":"Corby Rosset"},{"id":"53f459c4dabfaee2a1d80cdf","name":"W. Bruce Croft"},{"id":"53f431fbdabfaedce54fff45","name":"Paul Bennett"}],"doi":"10.1145\u002F3397271.3401276","flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5ed12ca59e795e8ab1c11567","lang":"en","num_citation":1,"order":2,"pages":{"end":"1592","start":"1589"},"title":"Contextual Re-Ranking with Behavior Aware Transformers","urls":["https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?view_op=view_citation&hl=zh-CN&user=E9BaEBYAAAAJ&pagesize=100&citation_for_view=E9BaEBYAAAAJ:Ak0FvsSvgGUC","https:\u002F\u002Fsigir.org\u002Fsigir2020\u002Faccepted-papers\u002F","https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3397271.3401276","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Fsigir\u002FQuXZRCB20","https:\u002F\u002Fdoi.org\u002F10.1145\u002F3397271.3401276","https:\u002F\u002Fsigir.org\u002Fsigir2020\u002Faccepted-papers\u002F#242"],"venue":{"info":{"name":"SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval\n\t\t Virtual Event\n\t\t China\n\t\t July, 2020"},"issue":"","volume":""},"versions":[{"id":"5ed12ca59e795e8ab1c11567","sid":"5ed12ca59e795e8ab1c11567","src":"user-5ebe28934c775eda72abcddd","year":2020},{"id":"5f0277e911dc830562231e69","sid":"sigir2020#242","src":"conf_sigir","year":2020},{"id":"5f1d5a5a9fced0a24b59bd8c","sid":"10.1145\u002F3397271.3401276","src":"acm","vsid":"ir","year":2020},{"id":"5f214e1891e0111c790bd8cf","sid":"conf\u002Fsigir\u002FQuXZRCB20","src":"dblp","vsid":"conf\u002Fsigir","year":2020},{"id":"5fae6dc0d4150a363cec24bd","sid":"3035187263","src":"mag","vsid":"1140684652","year":2020}],"year":2020},{"abstract":"Neural language models are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testing, we propose using optimal transport (OT) to match the sequences generated in these two modes. We examine the necessity of adding Student-Forcing scheme during training with an imitation learning interpretation. An extension is further proposed to improve the OT learning for long sequences, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.","authors":[{"id":"542cca92dabfae498ae10f67","name":"Jianqiao Li"},{"id":"53f42d9adabfaee1c0a36753","name":"Chunyuan Li"},{"id":"562d0a9545cedb3398d38079","name":"Guoyin Wang"},{"id":"53f31acddabfae9a8443362e","name":"Hao Fu"},{"name":"Yuhchen Lin"},{"id":"54055501dabfae8faa5c3a71","name":"Liqun Chen"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"542bfaafdabfae1ad8958360","name":"Chenyang Tao"},{"id":"5448b943dabfae1e04137212","name":"Ruiyi Zhang"},{"id":"562999c745cedb3398889494","name":"Wenlin Wang"},{"id":"561d7d0145cedb33980841c8","name":"Dinghan Shen"},{"id":"561f69a345cedb33981c819c","name":"Qian Yang"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"doi":"10.18653\u002FV1\u002F2020.EMNLP-MAIN.735","id":"5f7fe6d80205f07f68973220","num_citation":0,"order":6,"pages":{"end":"9156","start":"9144"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F352\u002F437\u002F209\u002F5f7fe6d80205f07f68973220_0.pdf","title":"Improving Text Generation with Student Forcing Optimal Transport","urls":["https:\u002F\u002F2020.emnlp.org\u002Fpapers\u002Fmain","https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05994","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2010.html#abs-2010-05994","https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.05994"],"venue":{"info":{"name":"EMNLP 2020"}},"versions":[{"id":"5f7fe6d80205f07f68973220","sid":"emnlp2020#264","src":"conf_emnlp","year":2020},{"id":"5f86c1ca91e011dbc7eba1df","sid":"2010.05994","src":"arxiv","year":2020},{"id":"5ff68d7cd4150a363cd59230","sid":"3105763789","src":"mag","vsid":"1192655580","year":2020}],"year":2020},{"abstract":"Missing sentence generation (or sentence in-filling) fosters a wide range of applications in natural language generation, such as document auto-completion and meeting note expansion. This task asks the model to generate intermediate missing sentences that can syntactically and semantically bridge the surrounding context. Solving the sentence infilling task requires techniques in natural language processing ranging from understanding to discourse-level planning to generation. In this paper, we propose a framework to decouple the challenge and address these three aspects respectively, leveraging the power of existing large-scale pre-trained models such as BERT and GPT-2. We empirically demonstrate the effectiveness of our model in learning a sentence representation for generation and further generating a missing sentence that fits the context.","authors":[{"id":"53f641e0dabfae56e3b79ec1","name":"Yichen Huang"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"6176e96060a96541e9c9c327","name":"Oussama Elachqar"},{"id":"560387a545cedb33961c2462","name":"Yu Cheng"}],"doi":"10.18653\u002FV1\u002F2020.ACL-MAIN.226","flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5ec49a639fced0a24b4de821","num_citation":9,"order":1,"pages":{"end":"2515","start":"2502"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F1189\u002F650\u002F1668\u002F5ec49a639fced0a24b4de821_0.pdf","title":"INSET: Sentence Infilling with INter-SEntential Transformer","urls":["https:\u002F\u002Facl2020.org\u002Fprogram\u002Faccepted\u002F","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Facl\u002FHuangZEC20","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-main.226\u002F","https:\u002F\u002Facl2020.org\u002Fprogram\u002Faccepted\u002F#270","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fconf\u002Facl\u002Facl2020.html#HuangZEC20","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-main.226.pdf"],"venue":{"info":{"name":"ACL"}},"versions":[{"id":"5ec49a639fced0a24b4de821","sid":"acl2020#270","src":"conf_acl","year":2020},{"id":"5ef5c81691e011b33003b6d8","sid":"conf\u002Facl\u002FHuangZEC20","src":"dblp","vsid":"conf\u002Facl","year":2020},{"id":"5fae6d86d4150a363cebc300","sid":"3034783931","src":"mag","vsid":"1188739475","year":2020},{"id":"5ff68c69d4150a363cd24d94","sid":"3102287766","src":"mag","vsid":"1188739475","year":2020}],"year":2020},{"abstract":" Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g., the manipulation over the data space cannot be easily achieved). Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text, without any supervision on semantics. A new mutual information upper bound is derived and leveraged to measure dependence between style and content. By minimizing this upper bound, the proposed method induces style and content embeddings into two independent low-dimensional spaces. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation in terms of content and style preservation. ","authors":[{"id":"5605126445cedb3396545ccf","name":"Pengyu Cheng"},{"id":"53f432fbdabfaee1c0a7aa69","name":"Martin Renqiang Min"},{"id":"561d7d0145cedb33980841c8","name":"Dinghan Shen"},{"name":"Christopher Malon"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"562dab1b45cedb3398eaac49","name":"Yitong Li"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5ec49a639fced0a24b4de80e","num_citation":18,"order":4,"pages":{"end":"7541","start":"7530"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F1170\u002F650\u002F1668\u002F5ec49a639fced0a24b4de80e_6.pdf","title":"Improving Disentangled Text Representation Learning with Information-Theoretic Guidance","urls":["https:\u002F\u002Facl2020.org\u002Fprogram\u002Faccepted\u002F","https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.00693","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Facl\u002FChengMSMZLC20","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-main.673\u002F","https:\u002F\u002Facl2020.org\u002Fprogram\u002Faccepted\u002F#251","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2006.html#abs-2006-00693"],"venue":{"info":{"name":"ACL"}},"versions":[{"id":"5ec49a639fced0a24b4de80e","sid":"acl2020#251","src":"conf_acl","year":2020},{"id":"5ed623da91e01198019afbc6","sid":"2006.00693","src":"arxiv","year":2020},{"id":"5ef5c81691e011b33003b648","sid":"conf\u002Facl\u002FChengMSMZLC20","src":"dblp","vsid":"conf\u002Facl","year":2020},{"id":"5fae6d46d4150a363ceb4bc4","sid":"3034303964","src":"mag","vsid":"1188739475","year":2020}],"year":2020},{"authors":[{"id":"618189968672f196d679c02a","name":"Siyang Yuan"},{"name":"Ke Bai"},{"id":"54055501dabfae8faa5c3a71","name":"Liqun Chen"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"542bfaafdabfae1ad8958360","name":"Chenyang Tao"},{"id":"53f42d9adabfaee1c0a36753","name":"Chunyuan Li"},{"id":"542a5574dabfae646d551a73","name":"Guoyin Wang"},{"name":"Ricardo Henao"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"id":"60001c8e91e011c8f78fd599","num_citation":0,"order":3,"title":"Advancing weakly supervised cross-domain alignment with optimal transport.","urls":["https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Fbmvc\u002FYuanBCZTLWHC20","https:\u002F\u002Fwww.bmvc2020-conference.com\u002Fassets\u002Fpapers\u002F0566.pdf","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fconf\u002Fbmvc\u002Fbmvc2020.html#YuanBCZTLWHC20"],"venue":{"info":{"name":"BMVC"}},"versions":[{"id":"60001c8e91e011c8f78fd599","sid":"conf\u002Fbmvc\u002FYuanBCZTLWHC20","src":"dblp","vsid":"conf\u002Fbmvc","year":2020},{"id":"605aa2dfe4510cd7c86c4693","sid":"3129050438","src":"mag","vsid":"1198654179","year":2020}],"year":2020},{"abstract":" Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation. However, these models cannot be directly employed to generate text under specified lexical constraints. To address this challenge, we present POINTER, a simple yet novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. This procedure is recursively applied until a sequence is completed. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable. Since our training objective resembles the objective of masked language modeling, BERT can be naturally utilized for initialization. We pre-train our model with the proposed progressive insertion-based objective on a 12GB Wikipedia dataset, and fine-tune it on downstream hard-constrained generation tasks. Non-autoregressive decoding yields a logarithmic time complexity during inference time. Experimental results on both News and Yelp datasets demonstrate that POINTER achieves state-of-the-art performance on constrained text generation. We intend to release the pre-trained model to facilitate future research. ","authors":[{"id":"562f456b45cedb33995dbe96","name":"Zhang Yizhe"},{"id":"562d0a9545cedb3398d38079","name":"Wang Guoyin"},{"id":"53f42d9adabfaee1c0a36753","name":"Li Chunyuan"},{"id":"5622795d45cedb33983cc300","name":"Gan Zhe"},{"id":"53f4804edabfae963d2596a1","name":"Brockett Chris"},{"id":"53f43f9cdabfaedd74ddb705","name":"Dolan Bill"}],"flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5eb78919da5629cf24430377","num_citation":2,"order":0,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2005\u002F2005.00558.pdf","title":"POINTER: Constrained Progressive Text Generation via Insertion based Generative Pre training","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.00558","https:\u002F\u002F2020.emnlp.org\u002Fpapers\u002Fmain"],"venue":{"info":{"name":"EMNLP 2020"}},"versions":[{"id":"5eb78919da5629cf24430377","sid":"2005.00558","src":"arxiv","year":2020},{"id":"5f7fe6d80205f07f689732b5","sid":"emnlp2020#413","src":"conf_emnlp","year":2020}],"year":2020},{"abstract":" We introduce a new task, Contextual Text Style Transfer - translating a sentence into a desired style with its surrounding context taken into account. This brings two key challenges to existing style transfer approaches: ($i$) how to preserve the semantic meaning of target sentence and its consistency with surrounding context during transfer; ($ii$) how to train a robust model with limited labeled data accompanied with context. To realize high-quality style transfer with natural context preservation, we propose a Context-Aware Style Transfer (CAST) model, which uses two separate encoders for each input sentence and its surrounding context. A classifier is further trained to ensure contextual consistency of the generated sentence. To compensate for the lack of parallel data, additional self-reconstruction and back-translation losses are introduced to leverage non-parallel data in a semi-supervised fashion. Two new benchmarks, Enron-Context and Reddit-Context, are introduced for formality and offensiveness style transfer. Experimental results on these datasets demonstrate the effectiveness of the proposed CAST model over state-of-the-art methods across style accuracy, content preservation and contextual consistency metrics. ","authors":[{"id":"560387a545cedb33961c2462","name":"Cheng Yu"},{"id":"5622795d45cedb33983cc300","name":"Gan Zhe"},{"id":"562f456b45cedb33995dbe96","name":"Zhang Yizhe"},{"id":"6176e96060a96541e9c9c327","name":"Elachqar Oussama"},{"id":"56191c9245ce1e5964238301","name":"Li Dianqi"},{"id":"5631f8f345cedb3399f79c8c","name":"Liu Jingjing"}],"doi":"10.18653\u002FV1\u002F2020.FINDINGS-EMNLP.263","id":"5eafe7e091e01198d3986596","num_citation":5,"order":2,"pages":{"end":"2924","start":"2915"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2005\u002F2005.00136.pdf","title":"Contextual Text Style Transfer","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.00136","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Femnlp\u002FChengGZEL020","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.findings-emnlp.263\u002F","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2005.html#abs-2005-00136","https:\u002F\u002Fopenreview.net\u002Fforum?id=HkeJzANFwS","https:\u002F\u002Fopenreview.net\u002Fpdf?id=HkeJzANFwS","https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpublication\u002Fcontextual-text-style-transfer\u002F","https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2020arXiv200500136C\u002Fabstract"],"venue":{"info":{"name":"EMNLP"}},"versions":[{"id":"5eafe7e091e01198d3986596","sid":"2005.00136","src":"arxiv","year":2020},{"id":"5ff8839291e011c832673973","sid":"conf\u002Femnlp\u002FChengGZEL020","src":"dblp","vsid":"conf\u002Femnlp","year":2020},{"id":"5ff68b69d4150a363ccf273b","sid":"3099066892","src":"mag","vsid":"1192655580","year":2020}],"year":2020},{"abstract":"When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model Optimus (Organizing sentences via Pre-Trained Modeling of a Universal Space). A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks. Compared with GPT-2, Optimus enables guided language generation from an abstract level using the latent vectors. Compared with BERT, Optimus can generalize better on low-resource language understanding tasks due to the smooth latent space structure. Extensive experimental results on a wide range of language tasks demonstrate the effectiveness of Optimus. It achieves new state-of-the-art on VAE language modeling benchmarks.","authors":[{"id":"53f42d9adabfaee1c0a36753","name":"Li Chunyuan"},{"id":"53f4397adabfaefedbae5b5a","name":"Gao Xiang"},{"id":"54450bdcdabfae862da005e5","name":"Li Yuan"},{"id":"562fce4245cedb33997f3b3b","name":"Li Xiujun"},{"id":"562d021645cedb3398d293a1","name":"Peng Baolin"},{"id":"562f456b45cedb33995dbe96","name":"Zhang Yizhe"},{"id":"53f428e8dabfaec22b9e1c5d","name":"Gao Jianfeng"}],"doi":"10.18653\u002FV1\u002F2020.EMNLP-MAIN.378","flags":[{"flag":"affirm_author","person_id":"562f456b45cedb33995dbe96"}],"id":"5e8ef2ae91e011679da0f219","num_citation":18,"order":5,"pages":{"end":"4699","start":"4678"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2004\u002F2004.04092.pdf","title":"Optimus: Organizing Sentences via Pre trained Modeling of a Latent Space","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.04092","https:\u002F\u002F2020.emnlp.org\u002Fpapers\u002Fmain","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2004.html#abs-2004-04092","https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpublication\u002Foptimus-organizing-sentences-via-pre-trained-modeling-of-a-latent-space\u002F","https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2020arXiv200404092L\u002Fabstract"],"venue":{"info":{"name":"EMNLP 2020"}},"versions":[{"id":"5e8ef2ae91e011679da0f219","sid":"2004.04092","src":"arxiv","year":2020},{"id":"5f7fe6d80205f07f689732a8","sid":"emnlp2020#400","src":"conf_emnlp","year":2020},{"id":"5ff68b48d4150a363ccecf5c","sid":"3098708719","src":"mag","vsid":"1192655580","year":2020}],"year":2020}],"profilePubsTotal":74,"profilePatentsPage":1,"profilePatents":[],"profilePatentsTotal":0,"profilePatentsEnd":true,"profileProjectsPage":0,"profileProjects":null,"profileProjectsTotal":null,"newInfo":null,"checkDelPubs":[]}};