PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Zhao Yao
Zhao Yao
Saleh Mohammad
Saleh Mohammad

ICML, pp. 11328-11339, 2019.

Cited by: 21|Bibtex|Views421
EI
Other Links: arxiv.org|academic.microsoft.com|dblp.uni-trier.de
Weibo:
We proposed PEGASUS, a sequence-tosequence model with gap-sentences generation as a pretraining objective tailored for abstractive text summarization

Abstract:

Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluatio...More
0
Introduction
Highlights
  • Text summarization aims at generating accurate and concise summaries from input document(s)
  • We study pre-training objectives for abstractive text summarization and evaluate on 12 downstream datasets spanning news (Hermann et al, 2015; Narayan et al, 2018; Grusky et al, 2018; Rush et al, 2015; Fabbri et al, 2019), science (Cohan et al, 2018), short stories (Kim et al, 2019), instructions (Koupaee & Wang, 2018), emails (Zhang & Tetreault, 2019), patents (Sharma et al, 2019), and legislative bills (Kornilova & Eidelman, 2019)
  • We show how good abstractive summarization performance can be achieved across broad domains with very little supervision by fine-tuning the PEGASUS model and surpassing previous state-of-the-art results on many tasks with as little as 1000 examples
  • We increased the gap-sentences ratio to 45% to achieve a similar number of “gaps” as the optimal 30% found above
  • We proposed PEGASUS, a sequence-tosequence model with gap-sentences generation as a pretraining objective tailored for abstractive text summarization
  • We demonstrated the effects of the pre-training corpora, gap-sentences ratios, vocabulary sizes and scaled up the best configuration to achieve state-of-the-art results on all 12 diverse downstream datasets considered
Methods
  • In a similar strategy to Raffel et al (2019), to save time and computation the authors conducted pre-training ablation experiments using a reduced-size model with 223M parameters, PEGASUSBASE, smaller batch size, and only 4 of 12 datasets before scaling up pre-training with the best settings to the final 568M parameters, PEGASUSLARGE.
  • The authors pre-trained PEGASUSBASE with a batch size of 256 and PEGASUSLARGE with a batch size of 8192.
  • The authors used sinusoidal positional encoding following Vaswani et al (2017).
  • Both pre-training and finetuning used Adafactor (Shazeer & Stern, 2018) with square root learning rate decay and dropout rate of 0.1
Results
  • Compared with PEGASUSBASE, the large model PEGASUSLARGE had increased capacity from larger hidden size (H : 768 → 1024, F : 3072 → 4096, A : 12 → 16), number of layers (L : 12 → 16) and traversed much more data, due to larger batch size (B : 256 → 8192).
  • The authors adopted the best practices found in the PEGASUSBASE ablation studies using the GSG (Ind-Orig) pre-training objective without MLM and Unigram vocabulary size of 96k.
  • PEGASUSLARGE had 568M parameters.
  • The authors conducted a simple hyper-parameter sweep of learning rate and length penalty, R1/R2/RL
Conclusion
  • The authors proposed PEGASUS, a sequence-tosequence model with gap-sentences generation as a pretraining objective tailored for abstractive text summarization.
  • The authors demonstrated the effects of the pre-training corpora, gap-sentences ratios, vocabulary sizes and scaled up the best configuration to achieve state-of-the-art results on all 12 diverse downstream datasets considered.
  • The authors showed that the model was able to adapt to unseen summarization datasets very quickly, achieving strong results in as little as 1000 examples.
  • The training code and instructions for using model checkpoints can be found at https://github.com/google-research/ pegasus
Summary
  • Introduction:

    Text summarization aims at generating accurate and concise summaries from input document(s).
  • Most prior work on neural abstractive summarization relied on large-scale, high-quality datasets of supervised document-summary pairs (Hermann et al, 2015) and achieved promising results (Rush et al, 2015; Nallapati et al, 2016; See et al, 2017).
  • There has been increased interest in collecting new summarization datasets that have more abstractive summaries (Narayan et al, 2018), have longer documents, (Cohan et al, 2018; Sharma et al, 2019), utilize multiple documents (Fabbri et al, 2019), and are sourced from diverse domains (Grusky et al, 2018; Koupaee & Wang, 2018; Kim et al, 2019; Kornilova & Eidelman, 2019; Zhang & Tetreault, 2019); there has been little work on systematic evaluation of models across these broad settings
  • Methods:

    In a similar strategy to Raffel et al (2019), to save time and computation the authors conducted pre-training ablation experiments using a reduced-size model with 223M parameters, PEGASUSBASE, smaller batch size, and only 4 of 12 datasets before scaling up pre-training with the best settings to the final 568M parameters, PEGASUSLARGE.
  • The authors pre-trained PEGASUSBASE with a batch size of 256 and PEGASUSLARGE with a batch size of 8192.
  • The authors used sinusoidal positional encoding following Vaswani et al (2017).
  • Both pre-training and finetuning used Adafactor (Shazeer & Stern, 2018) with square root learning rate decay and dropout rate of 0.1
  • Results:

    Compared with PEGASUSBASE, the large model PEGASUSLARGE had increased capacity from larger hidden size (H : 768 → 1024, F : 3072 → 4096, A : 12 → 16), number of layers (L : 12 → 16) and traversed much more data, due to larger batch size (B : 256 → 8192).
  • The authors adopted the best practices found in the PEGASUSBASE ablation studies using the GSG (Ind-Orig) pre-training objective without MLM and Unigram vocabulary size of 96k.
  • PEGASUSLARGE had 568M parameters.
  • The authors conducted a simple hyper-parameter sweep of learning rate and length penalty, R1/R2/RL
  • Conclusion:

    The authors proposed PEGASUS, a sequence-tosequence model with gap-sentences generation as a pretraining objective tailored for abstractive text summarization.
  • The authors demonstrated the effects of the pre-training corpora, gap-sentences ratios, vocabulary sizes and scaled up the best configuration to achieve state-of-the-art results on all 12 diverse downstream datasets considered.
  • The authors showed that the model was able to adapt to unseen summarization datasets very quickly, achieving strong results in as little as 1000 examples.
  • The training code and instructions for using model checkpoints can be found at https://github.com/google-research/ pegasus
Tables
  • Table1: Results of PEGASUSLARGE and PEGASUSBASE on all downstream datasets compared with the previous SOTA, which are fetched from (<a class="ref-link" id="cLewis_et+al_2019_a" href="#rLewis_et+al_2019_a">Lewis et al, 2019</a>; <a class="ref-link" id="cShi_et+al_2019_a" href="#rShi_et+al_2019_a">Shi et al, 2019</a>; <a class="ref-link" id="cFabbri_et+al_2019_a" href="#rFabbri_et+al_2019_a"><a class="ref-link" id="cFabbri_et+al_2019_a" href="#rFabbri_et+al_2019_a">Fabbri et al, 2019</a></a>; Koupaee & Wang, 2018; <a class="ref-link" id="cKim_et+al_2019_a" href="#rKim_et+al_2019_a">Kim et al, 2019</a>; <a class="ref-link" id="cSubramanian_et+al_2019_a" href="#rSubramanian_et+al_2019_a">Subramanian et al, 2019</a>; <a class="ref-link" id="cSong_et+al_2019_a" href="#rSong_et+al_2019_a">Song et al, 2019</a>; <a class="ref-link" id="cZhang_2019_a" href="#rZhang_2019_a">Zhang & Tetreault, 2019</a>; <a class="ref-link" id="cKornilova_2019_a" href="#rKornilova_2019_a">Kornilova & Eidelman, 2019</a>). We only compared with previous abstractive models except on BillSum which had extractive results only. BIGPATENT, arXiv, PubMed and Multi-News datasets contain very long summaries and we truncate them to 256 tokens, in similar range compared to (<a class="ref-link" id="cSharma_et+al_2019_a" href="#rSharma_et+al_2019_a">Sharma et al, 2019</a>; <a class="ref-link" id="cCohan_et+al_2018_a" href="#rCohan_et+al_2018_a">Cohan et al, 2018</a>; <a class="ref-link" id="cFabbri_et+al_2019_a" href="#rFabbri_et+al_2019_a"><a class="ref-link" id="cFabbri_et+al_2019_a" href="#rFabbri_et+al_2019_a">Fabbri et al, 2019</a></a>; <a class="ref-link" id="cGoodman_et+al_2019_a" href="#rGoodman_et+al_2019_a">Goodman et al, 2019</a>). Best ROUGE numbers on each dataset and numbers within 0.15 of the best numbers are bolded
  • Table2: A comparison of PEGASUSLARGE with other pretrained models on XSum, CNN/DailyMail and Gigaword. Best ROUGE numbers and numbers within 0.15 of the best numbers are bolded
  • Table3: Human evaluation side-by-side results on Likert (1-5) scale (higher is better). Scores are bolded if they are not worse than human-level performance by p < 0.01
Download tables as Excel
Related work
  • Dai & Le (2015); Ramachandran et al (2017) used LM and autoencoder pre-training on in-domain data to improve performance of RNN sequence models. However, the combination of pre-training with much larger external text corpora (such as Wikipedia, books, or Web-pages) and Transformerbased sequence models has led to a dramatic improvement in performance when fine-tuned for both natural language understanding and text generation tasks (Radford et al, 2018a; Devlin et al, 2019; Rothe et al, 2019; Yang et al, 2019; Joshi et al, 2019; Song et al, 2019; Dong et al, 2019; Lewis et al, 2019). Most similar to our approach are Transformer encoder-decoder models pre-trained on some masked input pre-training objective.

    MASS (Song et al, 2019) proposed masked sequence-tosequence generation that reconstructs a sentence fragment given the remaining part of the sentence. A single sentence fragment was randomly selected.

    UniLM (Dong et al, 2019) proposed jointly training on three types of language modeling tasks: unidirectional (leftto-right and right-to-left), bidirectional (word-level mask, with next sentence prediction), and sequence-to-sequence (word-level mask) prediction.
Reference
  • Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
    Findings
  • Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S., Chang, W., and Goharian, N. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 615–621, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2097. URL https://www.aclweb.org/anthology/N18-2097.
    Locate open access versionFindings
  • http://papers.nips.cc/paper/5949-semisupervised-sequence-learning.pdf.
    Findings
  • Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/ v1/N19-1423. URL https://www.aclweb.org/anthology/N19-1423.
    Locate open access versionFindings
  • Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J., Zhou, M., and Hon, H.-W. Unified language model pre-training for natural language understanding and generation. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019.
    Google ScholarLocate open access versionFindings
  • Fabbri, A., Li, I., She, T., Li, S., and Radev, D. Multinews: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1074–1084, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1102. URL https://www.aclweb.org/anthology/P19-1102.
    Locate open access versionFindings
  • Goodman, S., Lan, Z., and Soricut, R. Multi-stage pretraining for abstractive summarization, 2019.
    Google ScholarFindings
  • Graff, D., Kong, J., Chen, K., and Maeda, K. English gigaword. Linguistic Data Consortium, Philadelphia, 4 (1):34, 2003.
    Google ScholarFindings
  • Grusky, M., Naaman, M., and Artzi, Y. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018. doi: 10.18653/v1/n181065. URL http://dx.doi.org/10.18653/v1/n18-1065.
    Locate open access versionFindings
  • Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. Teaching machines to read and comprehend. In Advances in neural information processing systems, pp. 1693–1701, 2015.
    Google ScholarLocate open access versionFindings
  • Dai, A. M. and Le, Q. V. Semi-supervised sequence learning. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 28, pp. 3079–3087. Curran Associates, Inc., 2015. URL
    Google ScholarLocate open access versionFindings
  • Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997. ISSN 0899-7667. doi: 10.1162/ neco.1997.9.8.1735. URL http://dx.doi.org/10.1162/neco.1997.9.8.1735.
    Locate open access versionFindings
  • Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., and Levy, O. SpanBERT: Improving pre-training by representing and predicting spans. arXiv preprint arXiv:1907.10529, 2019.
    Findings
  • Khandelwal, U., Clark, K., Jurafsky, D., and Kaiser, L. Sample efficient text summarization using a single pretrained transformer. arXiv preprint arXiv:1905.08836, 2019.
    Findings
  • Kim, B., Kim, H., and Kim, G. Abstractive summarization of Reddit posts with multi-level memory networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2519–2531, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1260. URL https://www.aclweb.org/anthology/N19-1260.
    Locate open access versionFindings
  • Klimt, B. and Yang, Y. The enron corpus: A new dataset for email classification research. In Proceedings of the 15th European Conference on Machine Learning, ECML’04, pp. 217–226, Berlin, Heidelberg, 2004. Springer-Verlag. ISBN 3-540-23105-6, 978-3-540-231059. doi: 10.1007/978-3-540-30115-8
    Locate open access versionFindings
  • 22. Kornilova, A. and Eidelman, V. BillSum: A corpus for automatic summarization of US legislation. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pp. 48–56, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/ v1/D19-5406. URL https://www.aclweb.org/anthology/D19-5406. Koupaee, M. and Wang, W. Y. Wikihow: A large scale text summarization dataset. arXiv preprint arXiv:1810.09305, 2018.
    Findings
  • Kryscinski, W., Keskar, N. S., McCann, B., Xiong, C., and Socher, R. Neural text summarization: A critical evaluation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 540–551, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1051. URL https://www.aclweb.org/anthology/D19-1051.
    Locate open access versionFindings
  • Kudo, T. Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv:1804.10959, 2018.
    Findings
  • Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
    Findings
  • Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp. 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W04-1013.
    Locate open access versionFindings
  • Liu, P. J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., and Shazeer, N. Generating wikipedia by summarizing long sequences. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Hyg0vbWC-.
    Locate open access versionFindings
  • Nallapati, R., Zhou, B., dos Santos, C., Gulcehre, C., and Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 280–290, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/K16-1028. URL https://www.aclweb.org/anthology/K16-1028.
    Locate open access versionFindings
  • Nallapati, R., Zhai, F., and Zhou, B. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, pp. 3075–3081. AAAI Press, 2017. URL http://dl.acm.org/citation.cfm?id=3298483.3298681.
    Locate open access versionFindings
  • Narayan, S., Cohen, S. B., and Lapata, M. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1797–1807, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1206. URL https://www.aclweb.org/anthology/D18-1206.
    Locate open access versionFindings
  • Paulus, R., Xiong, C., and Socher, R. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304, 2017.
    Findings
  • https://s3-us-west-2.amazonaws.com/openaiassets/researchcovers/languageunsupervised/language understanding paper.pdf, 2018a.
    Findings
  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language models are unsupervised multitask learners. 2018b. URL https://d4mucfpksywv.cloudfront.net/betterlanguage-models/language-models.pdf.
    Findings
  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring the limits of transfer learning with a unified text-to-text transformer, 2019.
    Google ScholarFindings
  • Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. Squad: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016. doi: 10.18653/v1/d16-1264. URL http://dx.doi.org/10.18653/v1/D16-1264.
    Locate open access versionFindings
  • Ramachandran, P., Liu, P., and Le, Q. Unsupervised pretraining for sequence to sequence learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 383–391, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1039. URL https://www.aclweb.org/anthology/D17-1039.
    Locate open access versionFindings
  • Rothe, S., Narayan, S., and Severyn, A. Leveraging pretrained checkpoints for sequence generation tasks. arXiv preprint arXiv:1907.12461, 2019.
    Findings
  • Rush, A. M., Chopra, S., and Weston, J. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1044. URL https://www.aclweb.org/anthology/D15-1044.
    Locate open access versionFindings
  • See, A., Liu, P. J., and Manning, C. D. Get to the point: Summarization with pointer-generator networks. CoRR, abs/1704.04368, 2017. URL http://arxiv.org/abs/1704.04368.
    Findings
  • Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/ v1/P16-1162. URL https://www.aclweb.org/anthology/P16-1162.
    Locate open access versionFindings
  • Sharma, E., Li, C., and Wang, L. BIGPATENT: A largescale dataset for abstractive and coherent summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2204–2213, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1212. URL https://www.aclweb.org/anthology/P19-1212.
    Locate open access versionFindings
  • Shazeer, N. and Stern, M. Adafactor: Adaptive learning rates with sublinear memory cost. arXiv preprint arXiv:1804.04235, 2018.
    Findings
  • Shi, T., Wang, P., and Reddy, C. K. LeafNATS: An opensource toolkit and live demo system for neural abstractive text summarization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 66–71, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/ v1/N19-4012. URL https://www.aclweb.org/anthology/N19-4012.
    Locate open access versionFindings
  • Song, K., Tan, X., Qin, T., Lu, J., and Liu, T.-Y. Mass: Masked sequence to sequence pre-training for language generation. In International Conference on Machine Learning, pp. 5926–5936, 2019.
    Google ScholarLocate open access versionFindings
  • Subramanian, S., Li, R., Pilault, J., and Pal, C. On extractive and abstractive neural document summarization with transformer language models. arXiv preprint arXiv:1909.03186, 2019.
    Findings
  • Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 3104–3112, Cambridge, MA, USA, 2014. MIT Press. URL http://dl.acm.org/citation.cfm?id=2969033.2969173.
    Locate open access versionFindings
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • Volske, M., Potthast, M., Syed, S., and Stein, B. TL;DR: Mining Reddit to learn automatic summarization. In Proceedings of the Workshop on New Frontiers in Summarization, pp. 59–63, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/W17-4508. URL https://www.aclweb.org/anthology/W17-4508.
    Locate open access versionFindings
  • Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. Glue: A multi-task benchmark and analysis platform for natural language understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018. doi: 10.18653/v1/w18-5446. URL http://dx.doi.org/10.18653/v1/w18-5446.
    Locate open access versionFindings
  • Welleck, S., Kulikov, I., Roller, S., Dinan, E., Cho, K., and Weston, J. Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319, 2019.
    Findings
  • Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
    Findings
  • Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., and Le, Q. V. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems, pp. 5754– 5764, 2019. URL http://papers.nips.cc/paper/8812-xlnet-generalizedautoregressive-pretraining-forlanguage-understanding.pdf.
    Locate open access versionFindings
  • Zhang, R. and Tetreault, J. This email could save your life: Introducing the task of email subject line generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 446–456, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1043. URL https://www.aclweb.org/anthology/P19-1043.
    Locate open access versionFindings
  • Zhong, M., Liu, P., Wang, D., Qiu, X., and Huang, X. Searching for effective neural extractive summarization: What works and whats next. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. doi: 10.18653/v1/p19-1100. URL http://dx.doi.org/10.18653/v1/p19-1100.
    Locate open access versionFindings
  • (2) Smart card technology defined.–In this section, the term smart card technology’ means the following: (A) Beneficiary smart card.–A machine readable, fraud- and tamper-resistant card (in this section referred to as a smart card’) that includes an embedded integrated circuit chip with a secure micro- controller that enables the verification and secure, electronic authentication of the identity of a Medicare beneficiary at the point of service through a combination of the smart card and a personal identification number known by or associated with such beneficiary. (B) Card reader technology.–Information technology that enables a supplier and provider to authenticate the identity of a Medicare beneficiary through presentation of such a smart card and such components, with such authentication to be reflected through the use of a modifier or in another appropriate manner, as determined by the Secretary, in the claims adjudication process.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments