Research Interests
Author Statistics
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科
s few-shot capabilities. Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt. Intuitively, the in-context examples selected with such a strategy may serve as more informative inputs to unleash GPT-$3 s extensive knowledge. We evaluate the proposed approach on several natural language understanding and generation benchmarks, where the retrieval-based prompt selection approach consistently outperforms the random baseline. Moreover, it is observed that the sentence encoders fine-tuned on task-related datasets yield even more helpful retrieval results. Notably, significant gains are observed on tasks such as table-to-text generation (41.9% on the ToTTo dataset) and open-domain question answering (45.5% on the NQ dataset). We hope our investigation could help understand the behaviors of GPT-$3$ and large-scale pre-trained LMs in general and enhance their few-shot capabilities. ","authors":[{"name":"Jiachang Liu"},{"id":"561d7d0145cedb33980841c8","name":"Dinghan Shen"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"53f43f9cdabfaedd74ddb705","name":"Bill Dolan"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"},{"id":"53f44b16dabfaedf435df98d","name":"Weizhu Chen"}],"id":"6006bb4891e0111a1b6a2346","num_citation":0,"order":4,"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F21\u002F2101\u002F2101.06804.pdf","title":"What Makes Good In-Context Examples for GPT-$3$?","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.06804"],"versions":[{"id":"6006bb4891e0111a1b6a2346","sid":"2101.06804","src":"arxiv","year":2021}],"year":2021},{"abstract":" An unbiased low-variance gradient estimator, termed GO gradient, was proposed recently for expectation-based objectives $\\mathbb{E}_{q_{\\boldsymbol{\\gamma}}(\\boldsymbol{y})} [f(\\boldsymbol{y})]$, where the random variable (RV) $\\boldsymbol{y}$ may be drawn from a stochastic computation graph with continuous (non-reparameterizable) internal nodes and continuous\u002Fdiscrete leaves. Upgrading the GO gradient, we present for $\\mathbb{E}_{q_{\\boldsymbol{\\boldsymbol{\\gamma}}}(\\boldsymbol{y})} [f(\\boldsymbol{y})]$ an unbiased low-variance Hessian estimator, named GO Hessian. Considering practical implementation, we reveal that GO Hessian is easy-to-use with auto-differentiation and Hessian-vector products, enabling efficient cheap exploitation of curvature information over stochastic computation graphs. As representative examples, we present the GO Hessian for non-reparameterizable gamma and negative binomial RVs\u002Fnodes. Based on the GO Hessian, we design a new second-order method for $\\mathbb{E}_{q_{\\boldsymbol{\\boldsymbol{\\gamma}}}(\\boldsymbol{y})} [f(\\boldsymbol{y})]$, with rigorous experiments conducted to verify its effectiveness and efficiency. ","authors":[{"id":"562b8e7e45cedb3398aac500","name":"Cong Yulai"},{"id":"562c4ced45cedb3398be4ad1","name":"Zhao Miaoyun"},{"id":"542cca92dabfae498ae10f67","name":"Li Jianqiao"},{"name":"Chen Junya"},{"id":"53f58452dabfaeaca9f8045b","name":"Carin Lawrence"}],"flags":[{"flag":"affirm_author","person_id":"53f58452dabfaeaca9f8045b"}],"id":"5ee9f15b91e01152af022d3e","num_citation":0,"order":4,"pages":{"end":"12068","start":"12060"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F531\u002F926\u002F1316\u002F5ee9f15b91e01152af022d3e_0.pdf","title":"GO Hessian for Expectation-Based Objectives","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.08873","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Faaai\u002FCongZLCC21","https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F17432"],"venue":{"info":{"name":"AAAI"}},"versions":[{"id":"5ee9f15b91e01152af022d3e","sid":"2006.08873","src":"arxiv","year":2020},{"id":"60bc9f1891e0112ac5fc20b2","sid":"conf\u002Faaai\u002FCongZLCC21","src":"dblp","vsid":"conf\u002Faaai","year":2021}],"year":2021},{"abstract":" We propose a novel and principled method to learn a nonparametric graph model called graphon, which is defined in an infinite-dimensional space and represents arbitrary-size graphs. Based on the weak regularity lemma from the theory of graphons, we leverage a step function to approximate a graphon. We show that the cut distance of graphons can be relaxed to the Gromov-Wasserstein distance of their step functions. Accordingly, given a set of graphs generated by an underlying graphon, we learn the corresponding step function as the Gromov-Wasserstein barycenter of the given graphs. Furthermore, we develop several enhancements and extensions of the basic algorithm, $e.g.$, the smoothed Gromov-Wasserstein barycenter for guaranteeing the continuity of the learned graphons and the mixed Gromov-Wasserstein barycenters for learning multiple structured graphons. The proposed approach overcomes drawbacks of prior state-of-the-art methods, and outperforms them on both synthetic and real-world data. The code is available at https:\u002F\u002Fgithub.com\u002FHongtengXu\u002FSGWB-Graphon. ","authors":[{"id":"53f44ca8dabfaefedbb2c6ca","name":"Hongteng Xu"},{"id":"56205f0d45cedb3398267b46","name":"Dixin Luo"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"},{"id":"53f488a2dabfaea3a923e72d","name":"Hongyuan Zha"}],"flags":[{"flag":"affirm_author","person_id":"53f58452dabfaeaca9f8045b"}],"id":"5fd34ba591e01161cf7395f6","num_citation":0,"order":2,"pages":{"end":"10513","start":"10505"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F1887\u002F359\u002F1423\u002F5fd34ba591e01161cf7395f6_0.pdf","title":"Learning Graphons via Structured Gromov-Wasserstein Barycenters","urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.05644","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Faaai\u002FXuLCZ21","https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F17257"],"venue":{"info":{"name":"AAAI"}},"versions":[{"id":"5fd34ba591e01161cf7395f6","sid":"2012.05644","src":"arxiv","year":2020},{"id":"60bc9f1891e0112ac5fc2476","sid":"conf\u002Faaai\u002FXuLCZ21","src":"dblp","vsid":"conf\u002Faaai","year":2021}],"year":2021},{"abstract":"Neural language models are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testing, we propose using optimal transport (OT) to match the sequences generated in these two modes. We examine the necessity of adding Student-Forcing scheme during training with an imitation learning interpretation. An extension is further proposed to improve the OT learning for long sequences, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.","authors":[{"name":"Jianqiao Li"},{"id":"53f42d9adabfaee1c0a36753","name":"Chunyuan Li"},{"id":"562d0a9545cedb3398d38079","name":"Guoyin Wang"},{"name":"Hao Fu"},{"name":"Yuhchen Lin"},{"id":"54055501dabfae8faa5c3a71","name":"Liqun Chen"},{"id":"562f456b45cedb33995dbe96","name":"Yizhe Zhang"},{"id":"542bfaafdabfae1ad8958360","name":"Chenyang Tao"},{"id":"5448b943dabfae1e04137212","name":"Ruiyi Zhang"},{"id":"562999c745cedb3398889494","name":"Wenlin Wang"},{"id":"561d7d0145cedb33980841c8","name":"Dinghan Shen"},{"name":"Qian Yang"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"doi":"10.18653\u002FV1\u002F2020.EMNLP-MAIN.735","id":"5f7fe6d80205f07f68973220","num_citation":0,"order":12,"pages":{"end":"9156","start":"9144"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fupload\u002Fpdf\u002F352\u002F437\u002F209\u002F5f7fe6d80205f07f68973220_0.pdf","title":"Improving Text Generation with Student Forcing Optimal Transport","urls":["https:\u002F\u002F2020.emnlp.org\u002Fpapers\u002Fmain","https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05994","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2010.html#abs-2010-05994","https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.05994"],"venue":{"info":{"name":"EMNLP 2020"}},"versions":[{"id":"5f7fe6d80205f07f68973220","sid":"emnlp2020#264","src":"conf_emnlp","year":2020},{"id":"5f86c1ca91e011dbc7eba1df","sid":"2010.05994","src":"arxiv","year":2020},{"id":"5ff68d7cd4150a363cd59230","sid":"3105763789","src":"mag","vsid":"1192655580","year":2020}],"year":2020},{"authors":[{"id":"5448b943dabfae1e04137212","name":"Ruiyi Zhang"},{"id":"542dbb37dabfae489b98a547","name":"Changyou Chen"},{"id":"560774c645ce1e595f0f7d6f","name":"Xinyuan Zhang"},{"name":"Ke Bai"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"id":"5ff8839291e011c832673ad7","num_citation":0,"order":4,"pages":{"end":"222","start":"212"},"title":"Semantic Matching via Optimal Partial Transport.","urls":["https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Femnlp\u002FZhangC0BC20","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.findings-emnlp.21\u002F","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fconf\u002Femnlp\u002Femnlp2020f.html#ZhangC0BC20"],"venue":{"info":{"name":"EMNLP"}},"versions":[{"id":"5ff8839291e011c832673ad7","sid":"conf\u002Femnlp\u002FZhangC0BC20","src":"dblp","vsid":"conf\u002Femnlp","year":2020},{"id":"5ff68ec0d4150a363cd899bb","sid":"3109285381","src":"mag","vsid":"1192655580","year":2020}],"year":2020},{"abstract":"Our CNN used multimodal retinal images to successfully predict diagnosis of symptomatic AD in an independent test set. GC-IPL maps were the most useful single inputs for prediction. Models including only images performed similarly to models also including quantitative data and patient data.","authors":[{"name":"C Ellis Wisely"},{"name":"Dong Wang"},{"name":"Ricardo Henao"},{"id":"53f4543bdabfaedd74e257d9","name":"Dilraj S Grewal"},{"id":"562b805745cedb3398a7dc5f","name":"Atalie C Thompson"},{"name":"Cason B Robbins"},{"name":"Stephen P Yoon"},{"name":"Srinath Soundararajan"},{"name":"Bryce W Polascik"},{"id":"53f47bcadabfaeb22f56e2d0","name":"James R Burke"},{"name":"Andy Liu"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"},{"id":"53f7e281dabfae90ec128bce","name":"Sharon Fekrat"}],"doi":"10.1136\u002Fbjophthalmol-2020-317659","id":"5fc22315d4150a363cd54c13","num_citation":0,"order":11,"title":"Convolutional neural network to identify symptomatic Alzheimer's disease using multimodal retinal imaging.","urls":["https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpubmed\u002F33243829","https:\u002F\u002Fbjo.bmj.com\u002Fcontent\u002Fearly\u002F2020\u002F11\u002F25\u002Fbjophthalmol-2020-317659","https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F33243829\u002F"],"venue":{"info":{"name":"The British journal of ophthalmology"}},"versions":[{"id":"5fc22315d4150a363cd54c13","sid":"33243829","src":"pubmed","vsid":"0421041","year":2020},{"id":"5ff68efed4150a363cd92de2","sid":"3109971553","src":"mag","vsid":"79587146","year":2020}],"year":2020},{"abstract":" Cross-domain alignment between two sets of entities (e.g., objects in an image, words in a sentence) is fundamental to both computer vision and natural language processing. Existing methods mainly focus on designing advanced attention mechanisms to simulate soft alignment, with no training signals to explicitly encourage alignment. The learned attention matrices are also dense and lacks interpretability. We propose Graph Optimal Transport (GOT), a principled framework that germinates from recent advances in Optimal Transport (OT). In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph. Two types of OT distances are considered: (i) Wasserstein distance (WD) for node (entity) matching; and (ii) Gromov-Wasserstein distance (GWD) for edge (structure) matching. Both WD and GWD can be incorporated into existing neural network models, effectively acting as a drop-in regularizer. The inferred transport plan also yields sparse and self-normalized alignment, enhancing the interpretability of the learned model. Experiments show consistent outperformance of GOT over baselines across a wide range of tasks, including image-text retrieval, visual question answering, image captioning, machine translation, and text summarization. ","authors":[{"id":"54055501dabfae8faa5c3a71","name":"Liqun Chen"},{"id":"5622795d45cedb33983cc300","name":"Zhe Gan"},{"id":"560387a545cedb33961c2462","name":"Yu Cheng"},{"id":"542f5b05dabfaee4e604dbfd","name":"Linjie Li"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"},{"name":"Jingjing Liu"}],"id":"5ede0553e06a4c1b26a83e9a","num_citation":0,"order":4,"pages":{"end":"1553","start":"1542"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2006\u002F2006.14744.pdf","title":"Graph Optimal Transport for Cross-Domain Alignment","urls":["https:\u002F\u002Ficml.cc\u002FConferences\u002F2020\u002FAcceptedPapersInitial","https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.14744","https:\u002F\u002Ficml.cc\u002FConferences\u002F2020\u002FAcceptedPapersInitial#146","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2006.html#abs-2006-14744","https:\u002F\u002Fwww.arxiv-vanity.com\u002Fpapers\u002F2006.14744\u002F","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Ficml\u002FChenG0LC020","http:\u002F\u002Fproceedings.mlr.press\u002Fv119\u002Fchen20e.html"],"venue":{"info":{"name":"ICML"}},"versions":[{"id":"5ede0553e06a4c1b26a83e9a","sid":"icml2020#146","src":"conf_icml","year":2020},{"id":"5ef9c12e91e011b84e1f8bc6","sid":"2006.14744","src":"arxiv","year":2020},{"id":"5fae6d39d4150a363ceb3684","sid":"3034220197","src":"mag","vsid":"1180662882","year":2020},{"id":"5ff881dd91e011c83266e408","sid":"conf\u002Ficml\u002FChenG0LC020","src":"dblp","vsid":"conf\u002Ficml","year":2020}],"year":2020},{"abstract":" Mutual information (MI) minimization has gained considerable interests in various machine learning tasks. However, estimating and minimizing MI in high-dimensional spaces remains a challenging problem, especially when only samples, rather than distribution forms, are accessible. Previous works mainly focus on MI lower bound approximation, which is not applicable to MI minimization problems. In this paper, we propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information. We provide a theoretical analysis of the properties of CLUB and its variational approximation. Based on this upper bound, we introduce an accelerated MI minimization training scheme, which bridges MI minimization with negative sampling. Simulation studies on Gaussian distributions show the reliable estimation ability of CLUB. Real-world MI minimization experiments, including domain adaptation and information bottleneck, further demonstrate the effectiveness of the proposed method. ","authors":[{"id":"5605126445cedb3396545ccf","name":"Pengyu Cheng"},{"id":"560e201545cedb339764a969","name":"Weituo Hao"},{"id":"561448aa45cedb3397a407d3","name":"Shuyang Dai"},{"name":"Jiachang Liu"},{"id":"5622795d45cedb33983cc300","name":"Zhe Gan"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"id":"5ede0553e06a4c1b26a83fc1","num_citation":0,"order":5,"pages":{"end":"1788","start":"1779"},"pdf":"https:\u002F\u002Fstatic.aminer.cn\u002Fstorage\u002Fpdf\u002Farxiv\u002F20\u002F2006\u002F2006.12013.pdf","title":"CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information","urls":["https:\u002F\u002Ficml.cc\u002FConferences\u002F2020\u002FAcceptedPapersInitial","https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.12013","https:\u002F\u002Ficml.cc\u002FConferences\u002F2020\u002FAcceptedPapersInitial#441","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fjournals\u002Fcorr\u002Fcorr2006.html#abs-2006-12013","https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.12013","https:\u002F\u002Fwww.arxiv-vanity.com\u002Fpapers\u002F2006.12013\u002F","https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Ficml\u002FChengHDLGC20","http:\u002F\u002Fproceedings.mlr.press\u002Fv119\u002Fcheng20b.html"],"venue":{"info":{"name":"ICML"}},"versions":[{"id":"5ede0553e06a4c1b26a83fc1","sid":"icml2020#441","src":"conf_icml","year":2020},{"id":"5ef3247a91e0110c353da77c","sid":"2006.12013","src":"arxiv","year":2020},{"id":"5fae6dabd4150a363cec05b4","sid":"3035060230","src":"mag","vsid":"1180662882","year":2020},{"id":"5ff881dd91e011c83266e41a","sid":"conf\u002Ficml\u002FChengHDLGC20","src":"dblp","vsid":"conf\u002Ficml","year":2020}],"year":2020},{"abstract":"Pretrained Language Models (PLMs) have improved the performance of natural language understanding in recent years. Such models are pretrained on large corpora, which encode the general prior knowledge of natural languages but are agnostic to information characteristic of downstream tasks. This often results in overfitting when fine-tuned with low resource datasets where task-specific information is limited. In this paper, we integrate label information as a task-specific prior into the self-attention component of pretrained BERT models. Experiments on several benchmarks and real-word datasets suggest that the proposed approach can largely improve the performance of pretrained models when fine-tuning with small datasets.","authors":[{"id":"53f42bf8dabfaec22ba01f28","name":"Rui Wang"},{"name":"Shijing Si"},{"id":"542a5574dabfae646d551a73","name":"Guoyin Wang"},{"id":"53f4986fdabfaee0d9c74c66","name":"Lei Zhang"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"},{"name":"Ricardo Henao"}],"doi":"10.18653\u002FV1\u002F2020.FINDINGS-EMNLP.285","id":"5ff8839291e011c832673aa9","num_citation":0,"order":4,"pages":{"end":"3186","start":"3181"},"title":"Integrating Task Specific Information into Pretrained Language Models for Low Resource Fine Tuning.","urls":["https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Femnlp\u002FWangS0ZCH20","https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.findings-emnlp.285\u002F","https:\u002F\u002Fdblp.uni-trier.de\u002Fdb\u002Fconf\u002Femnlp\u002Femnlp2020f.html#WangS0ZCH20"],"venue":{"info":{"name":"EMNLP"}},"versions":[{"id":"5ff8839291e011c832673aa9","sid":"conf\u002Femnlp\u002FWangS0ZCH20","src":"dblp","vsid":"conf\u002Femnlp","year":2020},{"id":"5ff68ce8d4150a363cd3dcb8","sid":"3103920202","src":"mag","vsid":"1192655580","year":2020}],"year":2020},{"authors":[{"name":"Shijing Si"},{"id":"53f42bf8dabfaec22ba01f28","name":"Rui Wang"},{"name":"Jedrek Wosik"},{"id":"53f4714ddabfaeb22f5592bd","name":"Hao Zhang"},{"name":"David Dov"},{"id":"542a5574dabfae646d551a73","name":"Guoyin Wang"},{"id":"53f58452dabfaeaca9f8045b","name":"Lawrence Carin"}],"id":"5ff8835791e011c832672cfc","num_citation":0,"order":6,"pages":{"end":"456","start":"436"},"title":"Students Need More Attention - BERT-based Attention Model for Small Data with Application to Automatic Patient Message Triage.","urls":["https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Fmlhc\u002FSiWWZD0C20","http:\u002F\u002Fproceedings.mlr.press\u002Fv126\u002Fsi20a.html"],"venue":{"info":{"name":"MLHC"}},"versions":[{"id":"5ff8835791e011c832672cfc","sid":"conf\u002Fmlhc\u002FSiWWZD0C20","src":"dblp","vsid":"conf\u002Fmlhc","year":2020}],"year":2020}],"profilePubsTotal":884,"profilePatentsPage":1,"profilePatents":[],"profilePatentsTotal":6,"profilePatentsEnd":true,"profileProjectsPage":0,"profileProjects":null,"profileProjectsTotal":null,"newInfo":null,"checkDelPubs":[]}};