AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
An unsupervised domain adaptation framework capable of identifying political texts for a multi-source, diachronic corpus by only leveraging supervision from a single-source, modern corpus

Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis

EMNLP/IJCNLP (1), pp.4717-4729, (2019)

Cited by: 0|Views206
EI

Abstract

Insightful findings in political science often require researchers to analyze documents of a certain subject or type, yet these documents are usually contained in large corpora that do not distinguish between pertinent and non-pertinent documents. In contrast, we can find corpora that label relevant documents but have limitations (e.g.,...More

Code:

Data:

0
Introduction
  • Recent progress in natural language processing and computational social science have pushed political science research into new frontiers.
  • The authors propose history-padding the input with ( i=1 d(i−1)(f − 1)) − n + 1 zeros to ensure the convolutions compress the sequence down to one output unit
  • This produces an output feature map of dimension B × C × 1 where B is the batch size and C is the number of channels; one can use a simple squeeze() operation to obtain the compact feature matrix B × C.
  • Though this is a subtle difference, the approach yields much richer representations for classification
Highlights
  • Recent progress in natural language processing and computational social science have pushed political science research into new frontiers
  • An unsupervised domain adaptation framework that learns from a single-source, labeled corpus and utilizes these representations effectively to obtain labels for a multi-source, unlabeled corpus
  • The 2D convolutional neural network (CNN) has decent F1 scores, showing that our framework works with standard CNN models
  • Does adaptive ensembling yield better topics? In Table 1, we showed that applying Latent Dirichlet Allocation (LDA) directly on Corpus of Historical American English (COHA) yields noisy, unrecognizable topics
  • An unsupervised domain adaptation framework capable of identifying political texts for a multi-source, diachronic corpus by only leveraging supervision from a single-source, modern corpus
Methods
  • Source Only mSDA DANN SE + curriculum AE + curriculum In-Domain Binary Task Mi-F Ma-F Multi-Label Task

    Ma-P Ma-R Ma-F of 200 and minimum word count from [1, 2, 3] to build the vocabulary.
  • Hyperparameters were discovered using a grid search on the held-out development set
Results
  • The authors benchmark the unsupervised domain adaptation framework against established methods: (1) Marginalized Stacked Denoising Autoencoders: Denoising autoencoders that marginalize out noise, enabling learning on infinitely many corrupted training samples (Chen et al, 2012). (2) Self-Ensembling (SE): A consistency regularization framework that stabilizes student and teacher network predictions under injected noise (Laine and Aila, 2017; Tarvainen and Valpola, 2017; French et al, 2018). (3) Domain-Adversarial Neural Network (DANN): Multi-component framework that learns domaininvariant representations through adversarial training (Ganin et al, 2016).
  • (3) Domain-Adversarial Neural Network (DANN): Multi-component framework that learns domaininvariant representations through adversarial training (Ganin et al, 2016).
  • (3) CNN (2D): A CNN using 2D kernels 3 × 300, 4 × 300, and 5 × 300 obtains representations (Kim, 2014)
  • They are max-pooled, concatenated row-wise, and passed through a fully-connected layer.
  • The time embedding significantly improves both F1 scores, indicating the model effectively utilizes the unique temporal information present in the corpora
Conclusion
  • An unsupervised domain adaptation framework capable of identifying political texts for a multi-source, diachronic corpus by only leveraging supervision from a single-source, modern corpus.
  • The authors' methods outperform strong benchmarks on both binary and multi-label classification tasks.
  • As well as an expert-annotated set of political articles from COHA, to facilitate domain adaptation research in NLP and political science research on public opinion over time
Summary
  • Introduction:

    Recent progress in natural language processing and computational social science have pushed political science research into new frontiers.
  • The authors propose history-padding the input with ( i=1 d(i−1)(f − 1)) − n + 1 zeros to ensure the convolutions compress the sequence down to one output unit
  • This produces an output feature map of dimension B × C × 1 where B is the batch size and C is the number of channels; one can use a simple squeeze() operation to obtain the compact feature matrix B × C.
  • Though this is a subtle difference, the approach yields much richer representations for classification
  • Methods:

    Source Only mSDA DANN SE + curriculum AE + curriculum In-Domain Binary Task Mi-F Ma-F Multi-Label Task

    Ma-P Ma-R Ma-F of 200 and minimum word count from [1, 2, 3] to build the vocabulary.
  • Hyperparameters were discovered using a grid search on the held-out development set
  • Results:

    The authors benchmark the unsupervised domain adaptation framework against established methods: (1) Marginalized Stacked Denoising Autoencoders: Denoising autoencoders that marginalize out noise, enabling learning on infinitely many corrupted training samples (Chen et al, 2012). (2) Self-Ensembling (SE): A consistency regularization framework that stabilizes student and teacher network predictions under injected noise (Laine and Aila, 2017; Tarvainen and Valpola, 2017; French et al, 2018). (3) Domain-Adversarial Neural Network (DANN): Multi-component framework that learns domaininvariant representations through adversarial training (Ganin et al, 2016).
  • (3) Domain-Adversarial Neural Network (DANN): Multi-component framework that learns domaininvariant representations through adversarial training (Ganin et al, 2016).
  • (3) CNN (2D): A CNN using 2D kernels 3 × 300, 4 × 300, and 5 × 300 obtains representations (Kim, 2014)
  • They are max-pooled, concatenated row-wise, and passed through a fully-connected layer.
  • The time embedding significantly improves both F1 scores, indicating the model effectively utilizes the unique temporal information present in the corpora
  • Conclusion:

    An unsupervised domain adaptation framework capable of identifying political texts for a multi-source, diachronic corpus by only leveraging supervision from a single-source, modern corpus.
  • The authors' methods outperform strong benchmarks on both binary and multi-label classification tasks.
  • As well as an expert-annotated set of political articles from COHA, to facilitate domain adaptation research in NLP and political science research on public opinion over time
Tables
  • Table1: Randomly sampled topics and top keywords derived from a 50-topic LDA model trained on a sample of COHA documents. Topic modeling results on a political subset of COHA are presented in Table 5. Additionally, topic model hyperparameters are detailed in Appendix A
  • Table2: Framework results for the binary label task (left) and multi-label task (right). For the binary task, we show micro- and macro-averaged F1 scores. For the multi-label task, we show macro-averaged precision, recall, and F1 scores
  • Table3: Model results with adaptive ensembling for the binary label task (left) and multi-label task (right). For the binary task, we show micro- and macroaveraged F1 scores. For the multi-label task, we show macro-averaged precision, recall, and F1 scores
  • Table4: Randomly sampled topics and top keywords derived from a 50-topic LDA model trained on 28K COHA articles identified as political using the SOURCE ONLY model
  • Table5: Randomly sampled topics and top keywords derived from a 50-topic LDA model trained on 28K COHA articles identified as political using ADAPTIVE ENSEMBLING
  • Table6: Distribution of train (In-Domain benchmark only), dev, test documents in our expert-annotated COHA subcorpus. For political documents, we break down the distribution into American Government (AG), Political Economy (PE), and International Relations (IR)
  • Table7: Political descriptors in NYT. Each descriptor is categorized under one or more political science areas: American Government (AG), Political Economy (PE), and International Relations (IR)
Download tables as Excel
Related work
  • Early approaches for unsupervised domain adaptation use shared autoencoders to create crossdomain representations (Glorot et al, 2011; Chen et al, 2012). More recently, Ganin et al (2016) introduce a new paradigm that create domaininvariant representations through adversarial training. This has gained popularity in NLP (Zhang et al, 2017; Fu et al, 2017; Chen et al, 2018), however the difficulties of adversarial training are well-established (Salimans et al, 2016; Arjovsky and Bottou, 2017). Consistency regularization methods (e.g., self-ensembling) outperform adversarial methods on visual semi-supervised and domain adaptation tasks (Athiwaratkun et al, 2019), but have rarely been applied to textual data (Ko et al, 2019). Finally, Huang and Paul (2018) establish the feasibility of using domain adaptation to label documents from discrete time periods.

    Our work departs from previous work by proposing an adaptive, time-aware approach to consistency regularization provisioned with causal convolutional networks.
Funding
  • This work was partially supported by the NSF Grant IIS-1850153
Study subjects and analysis
documents: 60000
Additionally, topic model hyperparameters are detailed in Appendix A. generated by Latent Dirichlet Allocation (LDA) (Blei et al, 2003), a popular topic model in social science, trained on 60,000 documents sampled from the Corpus of Historical American English (COHA) (Davies, 2008). The generated topics are extremely vague and not specific to politics

articles with the descriptor US POLITICS: 4800
Source We use NYT as the source dataset as it contains fine-grained descriptors of article content. We sample 4,800 articles with the descriptor US POLITICS & GOVERNMENT. To obtain non-political articles, we sample 4,800 documents whose descriptors do not overlap with an exhaustive list of political descriptors identified by a political science graduate student

documents: 4800
We sample 4,800 articles with the descriptor US POLITICS & GOVERNMENT. To obtain non-political articles, we sample 4,800 documents whose descriptors do not overlap with an exhaustive list of political descriptors identified by a political science graduate student. For our multilabel task, the annotator grouped descriptors in NYT that belong to each area label we consider1

total documents: 8000
To ensure our dataset is useful for diachronic analysis (e.g., public opinion over time), we sample only from news sources that consistently appear across the decades. Further, we ensure there are at least 8,000 total documents in each decade group; this narrows down our time span to 1922–1986. From this subset, we sample ∼250 documents from each decade for annotation

documents: 984
To train our unsupervised domain adaptation framework, we use 9,600 unlabeled target examples (same number as NYT). The expert-annotated dataset is divided into three subsets: a training set of 984 documents (only for training the In-Domain classifier discussed in §6.2), development set of 246 documents, and test set of 350 documents (50 per decade)2. 6.1 Settings

Reference
  • Brice DL Acree, Justin H Gross, Noah A Smith, Yanchuan Sim, and Amber E Boydstun. 2018. Etcha-sketching: Evaluating the post-primary rhetorical moderation hypothesis. American Politics Research.
    Google ScholarFindings
  • Martin Arjovsky and Leon Bottou. 2017. Towards principled methods for training generative adversarial networks. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, and Andrew Gordon Wilson. 2019. There are many consistent explanations of unlabeled data: Why you should average. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.
    Findings
  • Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2019. Trellis networks for sequence modeling. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Delia Baldassarri and Andrew Gelman. 2008. Partisans without constraint: Political polarization and trends in american public opinion. American Journal of Sociology, 114(2):408–446.
    Google ScholarLocate open access versionFindings
  • Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2019. Identifying and controlling important neurons in neural machine translation. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Matthew A Baum and Philip BK Potter. 200The relationships between mass media, public opinion, and foreign policy: Toward a theoretical synthesis. Annual Review of Political Science, 11:39–65.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Jerome Louradour, Ronan Collobert, and Jason Weston. 200Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 41–48.
    Google ScholarLocate open access versionFindings
  • George F Bishop. 2004. The illusion of public opinion: Fact and artifact in American public opinion polls. Rowman & Littlefield Publishers.
    Google ScholarFindings
  • David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993–1022.
    Google ScholarLocate open access versionFindings
  • Angus Campbell, Philip E Converse, Warren E Miller, and Donald E Stokes. 1980. The American Voter. University of Chicago Press.
    Google ScholarFindings
  • Minmin Chen, Zhixiang Xu, Kilian Q. Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on International Conference on Machine Learning, pages 1627–1634.
    Google ScholarLocate open access versionFindings
  • Xilun Chen, Yu Sun, Ben Athiwaratkun, Claire Cardie, and Kilian Weinberger. 2018. Adversarial deep averaging networks for cross-lingual sentiment classification. Transactions of the Association for Computational Linguistics, 6:557–570.
    Google ScholarLocate open access versionFindings
  • Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.
    Google ScholarLocate open access versionFindings
  • Eldad Davidov, Bart Meuleman, Jan Cieciuch, Peter Schmidt, and Jaak Billiet. 2014. Measurement equivalence in cross-national research. Annual review of sociology, 40:55–75.
    Google ScholarLocate open access versionFindings
  • Mark Davies. 2008. The corpus of contemporary american english: 450 million words, 1990-present.
    Google ScholarFindings
  • Zachary Elkins and Robert Shaffer. 2019. On measuring textual similarity. Work in progress.
    Google ScholarFindings
  • Geoff French, Michal Mackiewicz, and Mark Fisher. 2018. Self-ensembling for visual domain adaptation. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Lisheng Fu, Thien Huu Nguyen, Bonan Min, and Ralph Grishman. 2017. Domain adaptation for relation extraction with domain adversarial neural network. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 425–429.
    Google ScholarLocate open access versionFindings
  • Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning, pages 513–520.
    Google ScholarLocate open access versionFindings
  • Robert E Goodin. 2009. The Oxford handbook of political science, volume 11. Oxford University Press.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Xiaolei Huang and Michael J Paul. 2018. Examining temporality in document classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 694–699.
    Google ScholarLocate open access versionFindings
  • Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099.
    Findings
  • Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1746–1751.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference for Learning Representations.
    Google ScholarFindings
  • Wei-Jen Ko, Greg Durrett, and Junyi Jessy Li. 2019. Domain agnostic real-valued specificity prediction. In Proceedings of the AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Samuli Laine and Timo Aila. 2017. Temporal ensembling for semi-supervised learning. In International Conference for Learning Representations.
    Google ScholarFindings
  • Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440.
    Google ScholarLocate open access versionFindings
  • Scott de Marchi, Spencer Dorsey, and Michael J Ensley. 2018. Policy and the structure of roll call voting in the us house. Available at SSRN 3262316.
    Google ScholarFindings
  • Maxwell McCombs. 2018. Setting the agenda: Mass media and public opinion. John Wiley & Sons.
    Google ScholarFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training GANs. In Advances in Neural Information Processing Systems 29, pages 2234–2242.
    Google ScholarLocate open access versionFindings
  • Evan Sandhaus. 2008. The New York Times annotated corpus. Linguistic Data Consortium, Philadelphia, 6(12):e26752.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
    Google ScholarLocate open access versionFindings
  • Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30, pages 1195–1204.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, and Taylor Berg-Kirkpatrick. 2017. Improved variational autoencoders for text modeling using dilated convolutions. In Proceedings of the 34th International Conference on Machine Learning, pages 3881–3890.
    Google ScholarLocate open access versionFindings
  • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489.
    Google ScholarLocate open access versionFindings
  • John R Zaller et al. 1992. The nature and origins of mass opinion. Cambridge University Press.
    Google ScholarFindings
  • Yuan Zhang, Regina Barzilay, and Tommi Jaakkola. 2017. Aspect-augmented adversarial networks for domain adaptation. Transactions of the Association for Computational Linguistics, 5:515–528.
    Google ScholarLocate open access versionFindings
Author
Barea Sinno
Barea Sinno
Alex Rosenfeld
Alex Rosenfeld
Your rating :
0

 

Tags
Comments
小科