DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

Giorgi John M.
Giorgi John M.
Nitski Osvald
Nitski Osvald
Bader Gary D.
Bader Gary D.
Wang Bo
Wang Bo
Cited by: 0|Bibtex|Views60
Other Links: arxiv.org
Weibo:
Our method sometimes underperforms existing supervised solutions on average downstream performance, we found that this is partially explained by the fact that these methods are trained on the Stanford NLI dataset corpus, which is included as a downstream evaluation task in SentEv...

Abstract:

We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations, a self-supervised method for learning universal sentence embeddings that transfer to a wide variety of natural language processing (NLP) tasks. Our objective leverages recent advances in deep metric learning (DML) and has the advantage of being conce...More

Code:

Data:

0
Introduction
  • Due to the limited amount of labelled training data available for many natural language processing (NLP) tasks, transfer learning has become ubiquitous [1].
  • Transfer learning in NLP was limited to pretrained word-level embeddings, such as word2vec [2] or GloVe [3].
  • Recent work has demonstrated strong transfer task performance using pretrained sentence-level embeddings.
  • These fixed-length vectors, often referred to as “universal” sentence embeddings, are typically learned on a large text corpus and transferred for use in a variety of downstream tasks.
Highlights
  • Due to the limited amount of labelled training data available for many natural language processing (NLP) tasks, transfer learning has become ubiquitous [1]
  • Our method sometimes underperforms existing supervised solutions on average downstream performance, we found that this is partially explained by the fact that these methods are trained on the Stanford NLI dataset (SNLI) corpus, which is included as a downstream evaluation task in SentEval
  • If the average downstream performance is computed without considering SNLI, the difference in downstream performance between our method and the supervised methods shrinks considerably. 5.2 Ablation of the sampling procedure We ablate several components of the sampling procedure, including the number of anchors sampled per document A, the number of positives sampled per anchor P, and the sampling strategy for those positives (Figure 2)
  • We proposed a self-supervised objective for learning universal sentence representations
  • We demonstrated the effectiveness of our objective by evaluating the learned sentence representations on the SentEval benchmark, which contains a total of 28 tasks designed to evaluate the transferability and linguistic properties of sentence representations
  • When used to pretrain a transformer-based language model, our objective achieves new state-of-the-art performance, outperforming existing supervised, semi-supervised and unsupervised methods
  • The supervised methods we compare to (InferSent, Universal Sentence Encoder (USE) and Sentence Transformers) are all trained on the Stanford natural language inference (NLI) dataset (SNLI) [13], which is included as a downstream evaluation task in SentEval
Results
  • In subsection 5.1, the authors compare the performance of the model against the relevant baselines.
  • Sentence Transformers, which begins with a pretrained transformer model and fine-tunes it on NLI datasets, scores approximately 10% lower on the probing tasks than the model it fine-tunes.
  • The authors note that sampling multiple anchors per document has a large positive impact on the quality of the learned embeddings.
  • The authors hypothesize this is because the difficulty of the contrastive objective increases when A > 1.
  • The authors find that a positive sampling strategy that allows positives to be adjacent to and subsumed by the anchor outperforms a strategy which only allows adjacent or subsuming views, suggesting that the information captured by these views is
Conclusion
  • The authors proposed a self-supervised objective for learning universal sentence representations.
  • The authors' objective is conceptually simple, easy to implement, and applicable to any text encoder.
  • The authors demonstrated the effectiveness of the objective by evaluating the learned sentence representations on the SentEval benchmark, which contains a total of 28 tasks designed to evaluate the transferability and linguistic properties of sentence representations.
  • When used to pretrain a transformer-based language model, the objective achieves new state-of-the-art performance, outperforming existing supervised, semi-supervised and unsupervised methods.
  • The authors will release the model publicly in the hopes that it will be extended to new domains and non-English languages
Summary
  • Introduction:

    Due to the limited amount of labelled training data available for many natural language processing (NLP) tasks, transfer learning has become ubiquitous [1].
  • Transfer learning in NLP was limited to pretrained word-level embeddings, such as word2vec [2] or GloVe [3].
  • Recent work has demonstrated strong transfer task performance using pretrained sentence-level embeddings.
  • These fixed-length vectors, often referred to as “universal” sentence embeddings, are typically learned on a large text corpus and transferred for use in a variety of downstream tasks.
  • Objectives:

    The authors' objective is conceptually simple, easy to implement, and applicable to any text encoder.
  • Results:

    In subsection 5.1, the authors compare the performance of the model against the relevant baselines.
  • Sentence Transformers, which begins with a pretrained transformer model and fine-tunes it on NLI datasets, scores approximately 10% lower on the probing tasks than the model it fine-tunes.
  • The authors note that sampling multiple anchors per document has a large positive impact on the quality of the learned embeddings.
  • The authors hypothesize this is because the difficulty of the contrastive objective increases when A > 1.
  • The authors find that a positive sampling strategy that allows positives to be adjacent to and subsumed by the anchor outperforms a strategy which only allows adjacent or subsuming views, suggesting that the information captured by these views is
  • Conclusion:

    The authors proposed a self-supervised objective for learning universal sentence representations.
  • The authors' objective is conceptually simple, easy to implement, and applicable to any text encoder.
  • The authors demonstrated the effectiveness of the objective by evaluating the learned sentence representations on the SentEval benchmark, which contains a total of 28 tasks designed to evaluate the transferability and linguistic properties of sentence representations.
  • When used to pretrain a transformer-based language model, the objective achieves new state-of-the-art performance, outperforming existing supervised, semi-supervised and unsupervised methods.
  • The authors will release the model publicly in the hopes that it will be extended to new domains and non-English languages
Tables
  • Table1: Results on the downstream and probing tasks from the test set of the SentEval benchmark. USE: Google’s Universal Sentence Encoder. Transformer-small and Transformer-base are pretrained DistilRoBERTa and RoBERTa-base models respectively, using mean pooling. DeCLUTR-small and DeCLUTR-base are pretrained DistilRoBERTa and RoBERTa-base models respectively after continued pretraining with our method. Bold: best scores. ∆: difference to our methods (DeCLUTRbase) average score
  • Table2: Results on the downstream and probing tasks from the development set of the SentEval benchmark. We compare models trained with the Next Sentence Prediction (NSP) and SentenceOrder Prediction (SOP) losses to a model trained with neither, using two different pooling strategies: "*-CLS", where the special classification token prepended to every input is used as its sentencelevel representation, and "*-mean", where each sentence is represented by the mean of its token embeddings
  • Table3: Examples of text spans generated by our sampling procedure. During training, we randomly sample one or more anchors from every document in a minibatch. For each anchor, we randomly sample one or more positives adjacent to, overlapping with, or subsumed by the anchor. All anchor-positive pairs are contrasted with every other anchor-positive pair in the minibatch. This leads to easy negatives (anchors and positives sampled from other documents in a minibatch) and hard negatives (anchors and positives sampled from the same document). Here, examples are capped at a maximum length of 64 word-tokens. During training, we sample spans up-to a length of 512 word tokens
  • Table4: Results on the downstream tasks from the test set of the SentEval benchmark. USE: Google’s Universal Sentence Encoder. Bold: best scores
  • Table5: Results on the probing tasks from the test set of the SentEval benchmark. USE: Google’s Universal Sentence Encoder. Bold: best scores
  • Table6: Results on the downstream and probing tasks from the development set of the SentEval benchmark for a selection of pretrained transformer models, before and after they have been fine-tuned by the Sentence Transformers method. For all models, we use mean pooling on their token-level embeddings to produce fixed-length sentence embeddings. We use the "*-nli-mean-tokens" pretrained models obtained from the Sentence Transformers GitHub: https://github.com/UKPLab/sentence-transformers. ∆: difference to the probing task performance of the model before fine-tuning with the Sentence Transformers objective
  • Table7: Results on the downstream tasks from the development set of the SentEval benchmark. Averaged scores computed without considering the performance on the Stanford Natural Language Inference (SNLI) dataset are shown in parenthesis. USE: Google’s Universal Sentence Encoder. ∆: difference to our methods (DeCLUTR-base) average score
Download tables as Excel
Related work
  • Previous works on universal sentence embeddings can broadly be grouped into supervised, semisupervised or unsupervised approaches. Supervised or semi-supervised. The most successful universal sentence embedding methods are pretrained on the (human-labelled) natural language inference (NLI) datasets Stanford NLI (SNLI) [40] and MultiNLI [41]. NLI is the task of classifying a pair of sentences (denoted the “hypothesis” and the “premise”) into one of three relationships: entailment, contradiction or neutral. The effectiveness of NLI for training universal sentence encoders was demonstrated by the supervised method InferSent [4]. Google’s Universal Sentence Encoder (USE) is semi-supervised, augmenting supervised learning on SNLI with a mix of unsupervised objectives trained on unlabelled text.
Funding
  • Sentence Transformers, which begins with a pretrained transformer model and fine-tunes it on NLI datasets, scores approximately 10% lower on the probing tasks than the model it fine-tunes (see Supplementary Material, ??)
  • When used to pretrain a transformer-based language model, our objective achieves new state-of-the-art performance, outperforming existing supervised, semi-supervised and unsupervised methods
  • After controlling for SNLI, our method performs only ∼1% worse than Google’s USE
Study subjects and analysis
documents: 495243
4 Experiemental setup. 4.1 Dataset, training, and implementation Dataset We collected all documents with a minimum token length of 2048 from an open-access subset of the OpenWebText corpus [50], yielding 495,243 documents in total. For reference, Google’s USE was trained on 570,000 human-labelled sentence pairs from the SNLI dataset

human-labelled sentence pairs: 570000
4.1 Dataset, training, and implementation Dataset We collected all documents with a minimum token length of 2048 from an open-access subset of the OpenWebText corpus [50], yielding 495,243 documents in total. For reference, Google’s USE was trained on 570,000 human-labelled sentence pairs from the SNLI dataset. InferSent and Sentence Transformer models were trained on both SNLI and MultiNLI, for a total of 1 million human-labelled sentence pairs

documents with a minibatch size of 16 and: 495243
All models were trained on up to four NVIDIA Tesla V100 16 or 32GB GPUs. Training Unless specified otherwise, we train for 1 epoch over the 495,243 documents with a minibatch size of 16 and a temperature τ = 5 × 10−2 using the AdamW optimizer [55] with a learning rate (LR) of 5 × 10−5 and a weight decay of 0.1. For every document in a minibatch, we sample two anchor spans (A = 2), and two positive spans per anchor (P = 2)

Reference
  • Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, and Thomas Wolf. Transfer learning in natural language processing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pages 15–18, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017.
    Findings
  • Sandeep Subramanian, Adam Trischler, Yoshua Bengio, and Christopher J Pal. Learning general purpose distributed sentence representations via large scale multi-task learning. arXiv preprint arXiv:1804.00079, 2018.
    Findings
  • Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. Universal sentence encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 169–174, Brussels, Belgium, November 2018. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bertnetworks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019.
    Google ScholarLocate open access versionFindings
  • Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196, 2014.
    Google ScholarLocate open access versionFindings
  • Yacine Jernite, Samuel R Bowman, and David Sontag. Discourse-based objectives for fast unsupervised sentence representation learning. arXiv preprint arXiv:1705.00557, 2017.
    Findings
  • Ryan Kiros, Yukun Zhu, Russ R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Skip-thought vectors. In Advances in neural information processing systems, pages 3294–3302, 2015.
    Google ScholarLocate open access versionFindings
  • Felix Hill, Kyunghyun Cho, and Anna Korhonen. Learning distributed representations of sentences from unlabelled data. arXiv preprint arXiv:1602.03483, 2016.
    Findings
  • Lajanugen Logeswaran and Honglak Lee. An efficient framework for learning sentence representations. ArXiv, abs/1803.02893, 2018.
    Findings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. URL https://s3-us-west-2.amazonaws.com/openaiassets/researchcovers/languageunsupervised/language understanding paper.pdf, 2018.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5754–5764, 2019.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
    Findings
  • Guillaume Lample and Alexis Conneau. Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291, 2019.
    Findings
  • Yang You, Jing Li, Jonathan Hseu, Xiaodan Song, James Demmel, and Cho-Jui Hsieh. Reducing bert pre-training time from 3 days to 76 minutes. arXiv preprint arXiv:1904.00962, 2019.
    Findings
  • Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zettlemoyer, and Omer Levy. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77, 2020.
    Google ScholarLocate open access versionFindings
  • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
    Findings
  • David G Lowe. Similarity metric learning for a variable-kernel classifier. Neural computation, 7(1):72–85, 1995.
    Google ScholarLocate open access versionFindings
  • Sebastian Mika, Gunnar Ratsch, Jason Weston, Bernhard Scholkopf, and Klaus-Robert Mullers. Fisher discriminant analysis with kernels. In Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468), pages 41–48.
    Google ScholarLocate open access versionFindings
  • Eric P Xing, Michael I Jordan, Stuart J Russell, and Andrew Y Ng. Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems, pages 521–528, 2003.
    Google ScholarLocate open access versionFindings
  • Paul Wohlhart and Vincent Lepetit. Learning descriptors for object recognition and 3d pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3109–3118, 2015.
    Google ScholarLocate open access versionFindings
  • Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pages 499–515.
    Google ScholarLocate open access versionFindings
  • Ziming Zhang and Venkatesh Saligrama. Zero-shot learning via joint latent similarity embedding. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6034–6042, 2016.
    Google ScholarLocate open access versionFindings
  • Maxime Bucher, Stéphane Herbin, and Frédéric Jurie. Improving semantic embedding consistency by metric learning for zero-shot classiffication. In European Conference on Computer Vision, pages 730–746.
    Google ScholarLocate open access versionFindings
  • Laura Leal-Taixé, Cristian Canton-Ferrer, and Konrad Schindler. Learning by tracking: Siamese cnn for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 33–40, 2016.
    Google ScholarLocate open access versionFindings
  • Ran Tao, Efstratios Gavves, and Arnold WM Smeulders. Siamese instance search for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1420–1429, 2016.
    Google ScholarLocate open access versionFindings
  • Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017.
    Findings
  • Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. Triplet-center loss for multi-view 3d object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1945–1954, 2018.
    Google ScholarLocate open access versionFindings
  • Alexander Grabner, Peter M Roth, and Vincent Lepetit. 3d pose estimation and 3d model retrieval for objects in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3022–3031, 2018.
    Google ScholarLocate open access versionFindings
  • Sasi Kiran Yelamarthi, Shiva Krishna Reddy, Ashish Mishra, and Anurag Mittal. A zero-shot framework for sketch based image retrieval. In European Conference on Computer Vision, pages 316–333.
    Google ScholarLocate open access versionFindings
  • Rui Yu, Zhiyong Dou, Song Bai, Zhaoxiang Zhang, Yongchao Xu, and Xiang Bai. Hard-aware point-to-set deep metric for person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 188–204, 2018.
    Google ScholarLocate open access versionFindings
  • Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742. IEEE, 2006.
    Google ScholarLocate open access versionFindings
  • Philip Bachman, R Devon Hjelm, and William Buchwalter. Learning representations by maximizing mutual information across views. In Advances in Neural Information Processing Systems, pages 15509–15519, 2019.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. arXiv preprint arXiv:1911.05722, 2019.
    Findings
  • Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
    Findings
  • Alexis Conneau and Douwe Kiela. Senteval: An evaluation toolkit for universal sentence representations. arXiv preprint arXiv:1803.05449, 2018.
    Findings
  • Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326, 2015.
    Findings
  • Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics, 2018.
    Google ScholarLocate open access versionFindings
  • Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
    Findings
  • Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems, pages 1857–1865, 2016.
    Google ScholarLocate open access versionFindings
  • Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems, pages 1857–1865, 2016.
    Google ScholarLocate open access versionFindings
  • Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3733–3742, 2018.
    Google ScholarLocate open access versionFindings
  • Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
    Findings
  • R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In Proceeding of the International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
    Findings
  • Aaron Gokaslan and Vanya Cohen. Openwebtext corpus. http://Skylion007.github.io/ OpenWebTextCorpus, 2019.
    Findings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, 2017.
    Google ScholarLocate open access versionFindings
  • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. AllenNLP: A deep semantic natural language processing platform. In Proceedings of Workshop for NLP Open Source Software (NLPOSS), pages 1–6, Melbourne, Australia, July 2018. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kevin Musgrave, Ser-Nam Lim, and Serge Belongie. Pytorch metric learning. https://github.com/KevinMusgrave/pytorch-metric-learning, 2019.
    Findings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771, 2019.
    Findings
  • Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
    Findings
  • Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328–339, Melbourne, Australia, July 2018. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • [1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • [2] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
    Findings
  • [3] Alexis Conneau and Douwe Kiela. Senteval: An evaluation toolkit for universal sentence representations. arXiv preprint arXiv:1803.05449, 2018.
    Findings
  • [4] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
    Findings
  • [5] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bertnetworks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019.
    Google ScholarLocate open access versionFindings
  • [6] Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd annual meeting on association for computational linguistics, pages 115–124. Association for Computational Linguistics, 2005.
    Google ScholarLocate open access versionFindings
  • [7] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
    Google ScholarLocate open access versionFindings
  • [8] Ellen M Voorhees and Dawn M Tice. Building a question answering test collection. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 200–207, 2000.
    Google ScholarLocate open access versionFindings
  • [9] Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177, 2004.
    Google ScholarLocate open access versionFindings
  • [10] Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics, page 271. Association for Computational Linguistics, 2004.
    Google ScholarLocate open access versionFindings
  • [11] Janyce Wiebe, Theresa Wilson, and Claire Cardie. Annotating expressions of opinions and emotions in language. Language resources and evaluation, 39(2-3):165–210, 2005.
    Google ScholarLocate open access versionFindings
  • [12] Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, Roberto Zamparelli, et al. A sick cure for the evaluation of compositional distributional semantic models. In LREC, pages 216–223, 2014.
    Google ScholarLocate open access versionFindings
  • [13] Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326, 2015.
    Findings
  • [14] Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055, 2017.
    Findings
  • [15] Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. Semeval-2012 task 6: A pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 385–393, 2012.
    Google ScholarLocate open access versionFindings
  • [16] Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. * sem 2013 shared task: Semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 32–43, 2013.
    Google ScholarLocate open access versionFindings
  • [17] Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pages 81–91, 2014.
    Google ScholarLocate open access versionFindings
  • [18] Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, et al. Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pages 252–263, 2015.
    Google ScholarLocate open access versionFindings
  • [19] Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL; 2016. p. 497-511. ACL (Association for Computational Linguistics), 2016.
    Google ScholarLocate open access versionFindings
  • [20] Bill Dolan, Chris Quirk, and Chris Brockett. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th international conference on Computational Linguistics, page 350. Association for Computational Linguistics, 2004.
    Google ScholarLocate open access versionFindings
  • [21] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755.
    Google ScholarLocate open access versionFindings
  • [22] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • [23] Alexis Conneau, Germán Kruszewski, Guillaume Lample, Loïc Barrault, and Marco Baroni. What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint arXiv:1805.01070, 2018.
    Findings
  • [24] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
    Findings
  • [25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
    Google ScholarLocate open access versionFindings
  • [26] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. arXiv preprint arXiv:1911.05722, 2019.
    Findings
  • [27] Xun Wang, Haozhi Zhang, Weilin Huang, and Matthe R. Scott. Cross-batch memory for embedding learning. arXiv preprint arXiv:1912.06798, 2020.
    Findings
  • [28] Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3733–3742, 2018.
    Google ScholarLocate open access versionFindings
  • [29] Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
    Findings
Full Text
Your rating :
0

 

Tags
Comments