Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Kyle Swanson
Kyle Swanson

ACL, pp. 5609-5626, 2020.

Cited by: 1|Bibtex|Views14
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Keywords:
mean average precisionneural machine translationarea under the curvetext matchingnatural language inferenceMore(5+)
Weibo:
We report standard ranking and retrieval metrics including area under the curve, mean average precision, mean reciprocal rank, and precision at 1

Abstract:

Selecting input features of top relevance has become a popular method for building self-explaining models. In this work, we extend this selective rationalization approach to text matching, where the goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction. Our approac...More
0
Introduction
Highlights
  • The growing complexity of deep neural networks has given rise to the desire for self-explaining models (Li et al, 2016; Ribeiro et al, 2016; Zhang et al, 2016; Ross et al, 2017; Sundararajan et al, 2017; Alvarez-Melis and Jaakkola, 2018b; Chen et al, 2018a)
  • We extend selective rationalization for text matching and focus on two new challenges that are not addressed in previous rationalization work
  • We report standard ranking and retrieval metrics including area under the curve (AUC), mean average precision (MAP), mean reciprocal rank (MRR), and precision at 1 (P@1)
  • We propose jointly learning interpretable alignments as part of the downstream prediction to reveal how neural network models operate for text matching applications
  • Our method extends vanilla optimal transport by adding various constraints that produce alignments with highly controllable sparsity patterns, making them interpretable
  • Our models show superiority by selecting very few alignments while achieving text matching performance on par with alternative methods
Methods
  • The eSNLI and MultiRC tasks come from the ERASER benchmark (DeYoung et al, 2019), which was created to evaluate selective rationalization models.
  • The authors chose those two datasets as they are best suited for the text matching setup.
  • StackExchange2 is an online question answering platform and has been used as a benchmark in previous work.
  • The authors took the June 2019 data dumps3 of the AskUbuntu and SuperUser subdomains of the platform and combined them to form the dataset
Results
  • Before experimenting with the datasets, the authors first analyze the alignments obtained by different methods on a synthetic cost matrix in Figure 5.
  • Table 3 presents the results of all models on the StackExchange and MultiNews datasets.
  • The authors' model is able to use only 6 aligned pairs to achieve a P@1 of 96.6 on the MultiNews dataset.
  • The sparse attention model obtains a P@1 of 97.1 but uses more than 300 alignment pairs and is difficult to interpret.
  • Model complexity and speed on the StackExchange dataset are reported in Table 7 in Appendix C
Conclusion
  • Balancing performance and interpretability in deep learning models has become an increasingly important aspect of model design.
  • The authors propose jointly learning interpretable alignments as part of the downstream prediction to reveal how neural network models operate for text matching applications.
  • The authors' models show superiority by selecting very few alignments while achieving text matching performance on par with alternative methods.
  • The authors' method is very general in nature and can be used as a differentiable hard-alignment module in larger NLP models that compare two pieces of text, such as sequence-to-sequence models.
  • The authors' method is agnostic to the underlying nature of the two objects being aligned and can align disparate objects such as images and captions, enabling a wide range of future applications within NLP and beyond
Summary
  • Introduction:

    The growing complexity of deep neural networks has given rise to the desire for self-explaining models (Li et al, 2016; Ribeiro et al, 2016; Zhang et al, 2016; Ross et al, 2017; Sundararajan et al, 2017; Alvarez-Melis and Jaakkola, 2018b; Chen et al, 2018a).
  • For instance, one popular method is to design models that can perform classification using only a rationale, which is a subset of the text selected from the model input that fully explains the model’s prediction (Lei et al, 2016; Bastings et al, 2019; Chang et al, 2019).
  • How to find duplicate files?
  • Is there any way to find duplicate files.
  • This mapping is represented by a transport plan, or alignment matrix, P ∈ Rn+×m, where Pi,j indicates the amount of probability mass moved from xi to yj
  • Methods:

    The eSNLI and MultiRC tasks come from the ERASER benchmark (DeYoung et al, 2019), which was created to evaluate selective rationalization models.
  • The authors chose those two datasets as they are best suited for the text matching setup.
  • StackExchange2 is an online question answering platform and has been used as a benchmark in previous work.
  • The authors took the June 2019 data dumps3 of the AskUbuntu and SuperUser subdomains of the platform and combined them to form the dataset
  • Results:

    Before experimenting with the datasets, the authors first analyze the alignments obtained by different methods on a synthetic cost matrix in Figure 5.
  • Table 3 presents the results of all models on the StackExchange and MultiNews datasets.
  • The authors' model is able to use only 6 aligned pairs to achieve a P@1 of 96.6 on the MultiNews dataset.
  • The sparse attention model obtains a P@1 of 97.1 but uses more than 300 alignment pairs and is difficult to interpret.
  • Model complexity and speed on the StackExchange dataset are reported in Table 7 in Appendix C
  • Conclusion:

    Balancing performance and interpretability in deep learning models has become an increasingly important aspect of model design.
  • The authors propose jointly learning interpretable alignments as part of the downstream prediction to reveal how neural network models operate for text matching applications.
  • The authors' models show superiority by selecting very few alignments while achieving text matching performance on par with alternative methods.
  • The authors' method is very general in nature and can be used as a differentiable hard-alignment module in larger NLP models that compare two pieces of text, such as sequence-to-sequence models.
  • The authors' method is agnostic to the underlying nature of the two objects being aligned and can align disparate objects such as images and captions, enabling a wide range of future applications within NLP and beyond
Tables
  • Table1: Summary of constrained alignment construction and sparsity. # R is the number of replicas, # D is the number of dummy points, R one-to-k is the relaxed one-to-k assignment, and n = |X| ≤ |Y | = m
  • Table2: Statistics for the document ranking datasets
  • Table3: Performance of all models on the StackExchange and MultiNews datasets. We report ranking results and the average number of active alignments (# Align.) used. For our method with the exact k alignment constraint, we set k = 2 for StackExchange and k = 6 for MultiNews, respectively
  • Table4: e-SNLI accuracy, macro-averaged task F1, percentage of tokens in active alignments, and token-level F1 of the model-selected rationales compared to human-annotated rationales for the premise, hypothesis, and both (P&H F1). Accuracy numbers in parentheses use all attention weights, not just active ones. (+S) denotes supervised learning of rationales. † denotes results from <a class="ref-link" id="cDeyoung_et+al_2019_a" href="#rDeyoung_et+al_2019_a">DeYoung et al (2019</a>)
  • Table5: MultiRC macro-averaged task F1, percentage of tokens used in active alignments, and token-level F1 of the model-selected rationales compared to humanannotated rationales (R. F1). (+S) denotes supervised learning of rationales. † denotes results from <a class="ref-link" id="cDeyoung_et+al_2019_a" href="#rDeyoung_et+al_2019_a">DeYoung et al (2019</a>)
  • Table6: MultiRC macro-averaged task F1, percentage of tokens used in active alignments, and token-level F1 of the model-selected rationales compared to humanannotated rationales (R. F1). (+S) denotes supervised learning of rationales. All models use a simplified recurrent unit (<a class="ref-link" id="cLei_et+al_2018_a" href="#rLei_et+al_2018_a">Lei et al, 2018</a>) encoder
  • Table7: Number of parameters, training time, and inference time for models on the StackExchange dataset. Training time represents training time per epoch while inference time represents the average time to encode and align one pair of documents. All models use an NVIDIA Tesla V100 GPU
Download tables as Excel
Related work
  • Selective Rationalization. Model interpretability via selective rationalization has attracted considerable interest recently (Lei et al, 2016; Li et al, 2016; Chen et al, 2018a; Chang et al, 2019). Some recent work has focused on overcoming the challenge of learning in the selective rationalization regime, such as by enabling end-to-end differentiable training (Bastings et al, 2019) or by regularizing to avoid performance degeneration (Yu et al, 2019). Unlike these methods, which perform independent rationale selection on each input document, we extend selective rationalization by jointly learning selection and alignment, as it is better suited for text matching applications.

    Concurrent to this work, DeYoung et al (2019) introduce the ERASER benchmark datasets with human-annotated rationales along with several rationalization models. Similarly to DeYoung et al (2019), we measure the faithfulness of selected rationales, but our work differs in that we additionally emphasize sparsity as a necessary criterion for interpretable alignments.
Reference
  • David Alvarez-Melis and Tommi Jaakkola. 2018a. Gromov-Wasserstein alignment of word embedding spaces. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1881–1890, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • David Alvarez-Melis and Tommi Jaakkola. 2018b. Towards robust interpretability with self-explaining neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 7775–7784. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations.
    Google ScholarFindings
  • Joost Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable neural predictions with differentiable binary variables. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2963–2977, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Steven Bird, Edward Loper, and Ewan Klein. 2009. Natural Language Processing with Python. O’Reilly Media Inc.
    Google ScholarLocate open access versionFindings
  • Garrett Birkhoff. 194Tres observaciones sobre el algebra lineal. Universidad Nacional de Tucumán Revista Series A, 5:147–151.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 201Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yann Brenier. 1987. Décomposition polaire et réarrangement monotone des champs de vecteurs. C. R. Acad. Sci. Paris Sér I Math., 305:805–808.
    Google ScholarLocate open access versionFindings
  • Richard A. Brualdi. 1982. Notes of the birkhoff algorithm for doubly stochastic matrices. Canadian Mathematical Bulletin, 25:191–199.
    Google ScholarLocate open access versionFindings
  • Richard A Brualdi. 2006. Combinatorial Matrix Classes, volume 108. Cambridge University Press.
    Google ScholarLocate open access versionFindings
  • Luis A. Caffarelli and Robert J. McCann. 2010. Free boundaries in optimal transport and mongeampère obstacle problems. Annals of Mathematics, 171:673–730.
    Google ScholarLocate open access versionFindings
  • Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom. 2018. e-snli: Natural language inference with natural language explanations. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 9539–9549. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Shiyu Chang, Yang Zhang, Mo Yu, and Tommi Jaakkola. 2019. A game theoretic approach to class-wise selective rationalization. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 10055– 10065. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018a. Learning to explain: An information-theoretic perspective on model interpretation. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 883–892, Stockholmsmässan, Stockholm Sweden. PMLR.
    Google ScholarLocate open access versionFindings
  • Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, and Ram Nevatia. 2015. Abccnn: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960.
    Findings
  • Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, and Lawrence Carin. 2018b. Adversarial text generation via featuremover’s distance. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 4666–4677. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 551–561, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2292–2300. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C. Wallace. 2019. Eraser: A benchmark to evaluate rationalized nlp models. arXiv preprint arXiv:1911.03429.
    Findings
  • Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th
    Google ScholarLocate open access versionFindings
  • Annual Meeting of the Association for Computational Linguistics, pages 1074–1084, Florence, Italy. Association for Computational Linguistics.
    Google ScholarFindings
  • Alessio Figalli. 2010. The optimal partial transport problem. Archive for Rational Mechanics and Analysis, 195:533–560.
    Google ScholarLocate open access versionFindings
  • Sarthak Jain and Byron C. Wallace. 2019. Attention is not explanation. arXiv preprint arXiv:1902.10186.
    Findings
  • Leonid Kantorovich. 1942. On the transfer of masses (in russian). Doklady Akademii Nauk, 37:227–229.
    Google ScholarLocate open access versionFindings
  • Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 252–262, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yoon Kim, Carl Denton, Luong Hoang, and Alexander M Rush. 2018. Structured attention networks. International Conference on Learning Representations.
    Google ScholarFindings
  • Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. International Conference on Learning Representations.
    Google ScholarFindings
  • Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 957–966, Lille, France. PMLR.
    Google ScholarLocate open access versionFindings
  • Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, and Harish G Ramaswamy. 2018. On controllable sparse alternatives to softmax. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 6422–6432. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Eric Lehman, Jay DeYoung, Regina Barzilay, and Byron C. Wallace. 2019. Inferring which medical treatments work from reports of clinical trials. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3705–3717, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 107–117, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, and Yoav Artzi. 2018. Simple recurrent units for highly parallelizable recurrence. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4470–4481, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, and Dan Jurafsky. 2016. Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220.
    Findings
  • Qiuchi Li, Benyou Wang, and Massimo Melucci. 2019. CNM: An interpretable complex-valued network for matching. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4139–4148, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, and Qi Su. 2018. Learning when to concentrate or divert attention: Self-adaptive attention temperature for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2985–2990, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Chaitanya Malaviya, Pedro Ferreira, and André F. T. Martins. 2018. Sparse and constrained attention for neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 370–376, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Andre Martins and Ramon Astudillo. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1614–1623, New York, New York, USA. PMLR.
    Google ScholarLocate open access versionFindings
  • Gonzalo Mena, David Belanger, Scott Linderman, and Jasper Snoek. 2018. Learning latent permutations with gumbel-sinkhorn networks. International Conference on Learning Representations.
    Google ScholarFindings
  • Gaspard Monge. 1781. Mémoir sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences, pages 666–704.
    Google ScholarFindings
  • Vlad Niculae and Mathieu Blondel. 2017. A regularized framework for sparse and structured neural attention. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 3338–3348. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Vlad Niculae, André F. T. Martins, Mathieu Blondel, and Claire Cardie. 2018. Sparsemap: Differentiable sparse structured inference. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 3799–3808. PMLR.
    Google ScholarLocate open access versionFindings
  • Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2249–2255, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. NIPS 2017 Autodiff Workshop.
    Google ScholarLocate open access versionFindings
  • Hugh Perkins and Yi Yang. 2019. Dialog intent induction with deep multi-view clustering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4014–4023. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Gabriel Peyré and Marco Cuturi. 2019. Computational optimal transport. Foundations and Trends in Machine Learning, 11:335–607.
    Google ScholarLocate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kocisky, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664.
    Findings
  • Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez. 2017. Right for the right reasons: Training differentiable models by constraining their explanations. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 2662–2670.
    Google ScholarLocate open access versionFindings
  • Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 379–389, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Cícero dos Santos, Luciano Barbosa, Dasha Bogdanova, and Bianca Zadrozny. 2015. Learning hybrid representations to retrieve semantically equivalent questions. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 694–699, Beijing, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bernhard Schmitzer. 2016. Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM Journal on Scientific Computing, 41:A1443– A1481.
    Google ScholarLocate open access versionFindings
  • Darsh Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. Adversarial domain adaptation for duplicate question detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1056–1063, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Richard Sinkhorn and Paul Knopp. 1967. Concerning nonnegative matrices and doubly stochastic matrices. Pacific J. Math, 21:343–348.
    Google ScholarLocate open access versionFindings
  • Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 3319–3328. JMLR.org.
    Google ScholarLocate open access versionFindings
  • James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2019. Generating token-level explanations for natural language inference. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 963–969, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 11–20, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Qizhe Xie, Xuezhe Ma, Zihang Dai, and Eduard Hovy. 2017. An interpretable knowledge transfer model for knowledge base completion. In Proceedings of the 55th Annual Meeting of the Association for
    Google ScholarLocate open access versionFindings
  • Computational Linguistics (Volume 1: Long Papers), pages 950–962, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin Duke. 2019. Gromov-Wasserstein learning for graph matching and node embedding. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6932–6941, Long Beach, California, USA. PMLR.
    Google ScholarLocate open access versionFindings
  • Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015.
    Google ScholarFindings
  • Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 2048–2057. JMLR.org.
    Google ScholarLocate open access versionFindings
  • Mo Yu, Shiyu Chang, Yang Zhang, and Tommi Jaakkola. 2019. Rethinking cooperative rationalization: Introspective extraction and complement control. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4094–4103, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ye Zhang, Iain Marshall, and Byron C. Wallace. 2016. Rationale-augmented convolutional neural networks for text classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 795–804, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments