Neural Natural Language Processing for Long Texts: A Survey on Classification and Summarization
Engineering applications of artificial intelligence(2024)
Abstract
The adoption of Deep Neural Networks (DNNs) has greatly benefited NaturalLanguage Processing (NLP) during the past decade. However, the demands of longdocument analysis are quite different from those of shorter texts, while theever increasing size of documents uploaded online renders automatedunderstanding of lengthy texts a critical issue. Relevant applications includeautomated Web mining, legal document review, medical records analysis,financial reports analysis, contract management, environmental impactassessment, news aggregation, etc. Despite the relatively recent development ofefficient algorithms for analyzing long documents, practical tools in thisfield are currently flourishing. This article serves as an entry point intothis dynamic domain and aims to achieve two objectives. First of all, itprovides an introductory overview of the relevant neural building blocks,serving as a concise tutorial for the field. Secondly, it offers a briefexamination of the current state-of-the-art in two key long document analysistasks: document classification and document summarization. Sentiment analysisfor long texts is also covered, since it is typically treated as a particularcase of document classification. Consequently, this article presents anintroductory exploration of document-level analysis, addressing the primarychallenges, concerns, and existing solutions. Finally, it offers a concisedefinition of "long text/document", presents an original overarching taxonomyof common deep neural methods for long document analysis and lists publiclyavailable annotated datasets that can facilitate further research in this area.
MoreTranslated text
Key words
Natural language processing,Long document,Document classification,Document summarization,Sentiment analysis,Deep neural networks
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined