A Joint Model for Document Segmentation and Segment Labeling

ACL, pp. 313-322, 2020.

Cited by: 0|Bibtex|Views12|Links
EI
Keywords:
Mean Average Precisiontext segmentationtopic segmentationlstm lstm crfmedical dictationMore(9+)
Weibo:
5.4 Segment Pooling LSTM Outperforms a Conditional random field Baseline In Table 1, the results demonstrate that Segment Pooling LSTM outperforms LSTM-LSTM-Conditional random field baseline in almost every case for single-labeling, and in every case for segmentation

Abstract:

Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. Where previous work on text segmentation considers the tasks of document segmentation and segment labeling separately, we show that the tasks contain complementary information and are best addressed jointly. We introduce the Segment...More

Code:

Data:

0
Introduction
  • One type of structure is the grouping of content into topically coherent segments.
  • These segmented documents have many uses across various domains and downstream tasks.
  • Segmentation can be used downstream for retrieval (Hearst and Plaunt, 2002; Edinger et al, 2017; Allan et al, 1998), where it can be useful when applied to informal text or speech that lacks explicit segment markup.
  • Segmented documents are useful for pre-reading, serving as an aid for reading comprehension (Swaffar et al, 1991; Ajideh, 2003)
Highlights
  • A well-written document is rich in content and in structure
  • We show that Segment Pooling LSTM is capable of reducing segmentation error by, on average, 30% while improving segment classification
  • In order to jointly model document segmentation and segment classification, we introduce the Segment Pooling LSTM (S-LSTM) model
  • 5.4 Segment Pooling LSTM Outperforms a Conditional random field Baseline In Table 1, the results demonstrate that Segment Pooling LSTM outperforms LSTM-LSTM-Conditional random field baseline in almost every case for single-labeling, and in every case for segmentation
  • In this paper we introduce the Segment Pooling LSTM (S-LSTM) model for joint segmentation and segment labeling tasks
  • We find that the model dramatically reduces segmentation error while improving segment labeling accuracy compared to previous neural and non-neural baselines for both singlelabel and multi-label tasks
Methods
  • The authors follow the experimental procedure of Arnold et al to evaluate S-LSTM for the tasks of document segmentation and segment labeling. 4.1 Datasets

    WikiSection.
  • The authors follow the experimental procedure of Arnold et al to evaluate S-LSTM for the tasks of document segmentation and segment labeling.
  • Arnold et al introduced the WikiSection dataset, which contains Wikipedia articles across two languages (English and German) and domains (Cities and Diseases).
  • Articles are segmented using the Wikipedia section structure.
  • There are two tasks: (1) jointly segment the document and assign a single restrictedvocabulary label to the segment, and (2) predict the bag-of-words in the title of the Wikipedia section as a label.
  • Align all ground truth segments with the maximum overlapping predicted segment. (↓)
Results
  • Results and Analysis

    There are five major takeaways from the experimental results and analysis.
  • The jointly trained S-LSTM model shows major improvement over prior work that modeled document segmentation and segment labeling tasks separately.
  • The segment pooling layer leads to improvements for both segmentation and segment labeling.
  • S-LSTM outperforms an IOB-tagging CRF-decoded model for single label segment labeling, and generalizes .
  • WikiSection-topics single-label classification model configuration.
  • C99 TopicTiling TextSeg SEC>T+emb LSTM-LSTM-CRF S-LSTM en_disease 27 topics.
  • WikiSection-headings multi-label classification en_disease 179 topics de_disease 115 topics en_city 603 topics de_city 318 topics model configuration.
  • C99 TopicTiling TextSeg SEC>H+emb S-LSTM.
Conclusion
  • In Table 1, the results demonstrate that S-LSTM outperforms LSTM-LSTM-CRF baseline in almost every case for single-labeling, and in every case for segmentation.
  • This makes S-LSTM a useful model choice for cases like clinical segmentation and labeling, where segments are drawn from a small fixed vocabulary.
  • Experiments demonstrate that jointly modeling the segmentation and segment labeling, segmentation alignment and exploration, and segment pooling each contribute to S-LSTM’s improved performance
Summary
  • Introduction:

    One type of structure is the grouping of content into topically coherent segments.
  • These segmented documents have many uses across various domains and downstream tasks.
  • Segmentation can be used downstream for retrieval (Hearst and Plaunt, 2002; Edinger et al, 2017; Allan et al, 1998), where it can be useful when applied to informal text or speech that lacks explicit segment markup.
  • Segmented documents are useful for pre-reading, serving as an aid for reading comprehension (Swaffar et al, 1991; Ajideh, 2003)
  • Methods:

    The authors follow the experimental procedure of Arnold et al to evaluate S-LSTM for the tasks of document segmentation and segment labeling. 4.1 Datasets

    WikiSection.
  • The authors follow the experimental procedure of Arnold et al to evaluate S-LSTM for the tasks of document segmentation and segment labeling.
  • Arnold et al introduced the WikiSection dataset, which contains Wikipedia articles across two languages (English and German) and domains (Cities and Diseases).
  • Articles are segmented using the Wikipedia section structure.
  • There are two tasks: (1) jointly segment the document and assign a single restrictedvocabulary label to the segment, and (2) predict the bag-of-words in the title of the Wikipedia section as a label.
  • Align all ground truth segments with the maximum overlapping predicted segment. (↓)
  • Results:

    Results and Analysis

    There are five major takeaways from the experimental results and analysis.
  • The jointly trained S-LSTM model shows major improvement over prior work that modeled document segmentation and segment labeling tasks separately.
  • The segment pooling layer leads to improvements for both segmentation and segment labeling.
  • S-LSTM outperforms an IOB-tagging CRF-decoded model for single label segment labeling, and generalizes .
  • WikiSection-topics single-label classification model configuration.
  • C99 TopicTiling TextSeg SEC>T+emb LSTM-LSTM-CRF S-LSTM en_disease 27 topics.
  • WikiSection-headings multi-label classification en_disease 179 topics de_disease 115 topics en_city 603 topics de_city 318 topics model configuration.
  • C99 TopicTiling TextSeg SEC>H+emb S-LSTM.
  • Conclusion:

    In Table 1, the results demonstrate that S-LSTM outperforms LSTM-LSTM-CRF baseline in almost every case for single-labeling, and in every case for segmentation.
  • This makes S-LSTM a useful model choice for cases like clinical segmentation and labeling, where segments are drawn from a small fixed vocabulary.
  • Experiments demonstrate that jointly modeling the segmentation and segment labeling, segmentation alignment and exploration, and segment pooling each contribute to S-LSTM’s improved performance
Tables
  • Table1: WikiSection results. Baselines are TopicTiling (<a class="ref-link" id="cRiedl_2012_a" href="#rRiedl_2012_a">Riedl and Biemann, 2012</a>), TextSeg (<a class="ref-link" id="cKoshorek_et+al_2018_a" href="#rKoshorek_et+al_2018_a">Koshorek et al, 2018</a>), and C99 (<a class="ref-link" id="cChoi_2000_a" href="#rChoi_2000_a">Choi, 2000</a>), and the best neural SECTOR models from Arnold et al
  • Table2: WikiSection headings task results, which predicts a multi-label bag-of-words drawn from section headers. To show the effect of the segment pooling and model exploration used in S-LSTM we report two variants where -expl uses only teacher forcing and -pool uses only mean pooling
  • Table3: Transfer results across four datasets. Those marked * are trained on the training portion of the corresponding dataset, whereas those without are either unsupervised or trained on a different dataset. For the Wiki-50, Cities, and Elements datasets, S-LSTM outperforms all models not trained on corresponding training set
  • Table4: A model trained to jointly predict segment bounds and segment labels improves classification over a baseline which only predicts labels. Both are given oracle segment bounds and do not use exploration
  • Table5: Inverse of the experiment in Table 4. A model that jointly predicts segment bounds and labels outperforms a model that only predicts segment bounds
Download tables as Excel
Related work
  • Coherence-based Segmentation. Much work on text segmentation uses measures of coherence to find topic shifts in documents. Hearst (1997) introduced the TextTiling algorithm, which uses term co-occurrences to find coherent segments in a document. Eisenstein and Barzilay (2008) introduced BayesSeg, a Bayesian method that can incorporate other features such as cue phrases. Riedl and Biemann (2012) later introduced TopicTiling, which uses coherence shifts in topic vectors to find segment bounds. Glavaš et al (2016) proposed GraphSeg, which constructs a semantic relatedness graph over the document using lexical features and word embeddings, and segments using cliques. Nguyen et al (2012) proposed SITS, a model for topic segmentation in dialogues that incorporates a per-speaker likelihood to change topics.
Funding
  • This work was supported through Adobe Gift Funding, which supports an Adobe Research-University of Maryland collaboration
Reference
  • Parviz Ajideh. 2003. Schema theory-based pre-reading tasks: A neglected essential in the esl reading class. The Reading Matrix, 3(1).
    Google ScholarLocate open access versionFindings
  • James Allan, Jaime G Carbonell, George Doddington, Jonathan Yamron, and Yiming Yang. 1998. Topic detection and tracking pilot study final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.
    Google ScholarLocate open access versionFindings
  • Sebastian Arnold, Rudolf Schneider, Philippe CudréMauroux, Felix A Gers, and Alexander Löser. Sector: A neural model for coherent topic segmentation and classification. Transactions of the Association for Computational Linguistics, 7.
    Google ScholarLocate open access versionFindings
  • Miguel Ballesteros, Yoav Goldberg, Chris Dyer, and Noah A Smith. 2016. Training with exploration improves a greedy stack-LSTM parser. In Proceedings of Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine Learning, 34(1-3).
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5.
    Google ScholarLocate open access versionFindings
  • Harr Chen, SRK Branavan, Regina Barzilay, and David R Karger. 2009. Content modeling using latent permutations. Journal of Artificial Intelligence Research, 36.
    Google ScholarLocate open access versionFindings
  • Freddy YY Choi. 2000. Advances in domain independent linear text segmentation. Conference of the North American Chapter of the Association for Computational Linguistics.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tracy Edinger, Dina Demner-Fushman, Aaron M Cohen, Steven Bedrick, and William Hersh. 2017. Evaluation of clinical text segmentation to facilitate cohort retrieval. In AMIA Annual Symposium Proceedings.
    Google ScholarLocate open access versionFindings
  • Jacob Eisenstein and Regina Barzilay. 2008. Bayesian unsupervised topic segmentation. In Proceedings of Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Kavita Ganesan and Michael Subotin. 2014. A general supervised approach to segmentation of clinical texts. In IEEE International Conference on Big Data.
    Google ScholarLocate open access versionFindings
  • Goran Glavaš, Federico Nanni, and Simone Paolo Ponzetto. 2016. Unsupervised text segmentation using semantic relatedness graphs. In Proceedings of the Joint Conference on Lexical and Computational Semantics.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of Artificial Intelligence and Statistics.
    Google ScholarLocate open access versionFindings
  • Joshua Goodman. 1996. Parsing algorithms and metrics. In Proceedings of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Marti Hearst and Christian Plaunt. 2002. Subtopic structuring for full-length document access. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.
    Google ScholarLocate open access versionFindings
  • Marti A. Hearst. 1997. Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1).
    Google ScholarLocate open access versionFindings
  • Jeremy Howard and Sebastian Ruder. 20Universal language model fine-tuning for text classification. In Proceedings of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Omri Koshorek, Adir Cohen, Noam Mor, Michael Rotman, and Jonathan Berant. 2018. Text segmentation as a supervised learning task. In Conference of the North American Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Conference of the North American Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chin-Yew Lin. 2004. Looking for a few good metrics: Automatic summarization evaluation-how many samples are enough? In NTCIR.
    Google ScholarFindings
  • Andrew McCallum and Wei Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Conference of the North American Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. 2012. SITS: A hierarchical nonparametric model using speaker identity for topic segmentation in multiparty conversations. In Proceedings of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Conference of the North American Chapter of the Association for Computational Linguistics. In International Conference on Speech and Computer. Springer.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1).
    Google ScholarLocate open access versionFindings
  • Janet K Swaffar, Katherine Arens, and Heidi Byrnes. 1991. Reading for meaning: An integrated approach to language learning. Pearson College Division.
    Google ScholarFindings
  • Michael Tepper, Daniel Capurro, Fei Xia, Lucy Vanderwende, and Meliha Yetisgen-Yildiz. 2012. Statistical section segmentation in free-text clinical records. In Proceedings of the Language Resources and Evaluation Conference.
    Google ScholarLocate open access versionFindings
  • Ronald J Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2).
    Google ScholarLocate open access versionFindings
  • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, and Xueqi Cheng. 2019. Outline generation: Understanding the inherent content structure of documents. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.
    Google ScholarLocate open access versionFindings
  • Alexandra Pomares-Quimbaya, Markus Kreuzthaler, and Stefan Schulz. 2019. Current approaches to identify sections within clinical narratives from electronic health records: a systematic review. BMC medical research methodology, 19(1).
    Google ScholarLocate open access versionFindings
  • Lance A Ramshaw and Mitchell P Marcus. 1999. Text chunking using transformation-based learning. In Natural language processing using very large corpora. Springer.
    Google ScholarFindings
  • Martin Riedl and Chris Biemann. 2012. TopicTiling: a text segmentation algorithm based on LDA. In Proceedings of ACL 2012 Student Research Workshop.
    Google ScholarLocate open access versionFindings
  • Najmeh Sadoughi, Greg P Finley, Erik Edwards, Amanda Robinson, Maxim Korenevsky, Michael Brenndoerfer, Nico Axtmann, Mark Miller, and David Suendermann-Oeft. 2018. Detecting section boundaries in medical dictations: Toward real-time conversion of medical dictations to clinical reports.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments