AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Links help much more than naively-processed text in ideology-detection problem, and follow is the most important relation to ideology detection

TIMME: Twitter Ideology-detection via Multi-task Multi-relational Embedding

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event..., pp.2258-2268, (2020)

被引用2|浏览340
EI
下载 PDF 全文
引用
微博一下

摘要

We aim at solving the problem of predicting people's ideology, or political tendency. We estimate it by using Twitter data, and formalize it as a classification problem. Ideology-detection has long been a challenging yet important problem. Certain groups, such as the policy makers, rely on it to make wise decisions. Back in the old days w...更多

代码

https://github.com/PatriciaXiao/TIMME , 从推特官方接口获取的20000个点的社交关系图,于2019年3月整理完毕

数据

0
简介
  • Studies on ideology never fails to attract people’s interests.
  • Ideology here refers to the political stance or tendency of people, often reflected as left- or right-leaning.
  • The booming development of social networks in the recent years shed light on detecting ordinary people’s ideology.
  • People are more relaxed than in an offline interview, and behave naturally.
  • In return, has shaped people’s habits, giving rise to opinion leaders, encouraging youngsters’ political involvement [25]
重点内容
  • Studies on ideology never fails to attract people’s interests
  • We propose TIMME for ideology detection on Twitter, whose encoder captures the interactions between different relations, and decoder treats different relations separately while measuring the importance of each relation to ideology detection
  • We propose TIMME as a multi-task learning model such that the sparsity of the labels could be overcome with the help of the link information
  • Comparing with r-Graph Convolutional Networks, we prove that their design is not as suitable for social networks as ours
  • Links help much more than naively-processed text in ideology-detection problem, and follow is the most important relation to ideology detection
方法
  • The authors have explored a lot of possible baseline models. Some methods the authors mentioned in section 2, HetGNN [43], GATNE [3] and GTN [42] generally converge ≈ 10 ∼ 100 times slower than the model on any task.
  • Other well-designed models such as GIN [40] are way too different from the approach at a very fundamental level, are not considered as baselines
  • Some other methods such as GEM [22] and SHINE [38] should be capable of handling the dataset at this scale, but they are not releasing their code to the public, and the authors can not guarantee reproduction.
  • The authors did not have to tune the hyper-parameters of TIMME models closely as hard, thanks to its robustness
结果
  • Less than 1%. The authors assume that, if by training on relation ri the authors achieve a good performance on relation rj , the authors say relation ri probably leads to rj.
结论
  • The TIMME models the authors proposed handles multiple relations, with a multi-relational encoder, and multi-task decoder.
  • The authors step aside the silent-majority problem by relying mostly on the relations, instead of the text information.
  • The authors accept incomplete input features, but the authors showed that links are able to do well on generating the ideology embedding without additional text information.
  • The authors' model could be extended to any other social network embedding problem, such as on any other dataset like Facebook as long as the dataset is legally available, and it works on predicting other tendencies like preferring Superman or Batman.
  • The authors believe that the dataset would be beneficial to the community
总结
  • Introduction:

    Studies on ideology never fails to attract people’s interests.
  • Ideology here refers to the political stance or tendency of people, often reflected as left- or right-leaning.
  • The booming development of social networks in the recent years shed light on detecting ordinary people’s ideology.
  • People are more relaxed than in an offline interview, and behave naturally.
  • In return, has shaped people’s habits, giving rise to opinion leaders, encouraging youngsters’ political involvement [25]
  • Methods:

    The authors have explored a lot of possible baseline models. Some methods the authors mentioned in section 2, HetGNN [43], GATNE [3] and GTN [42] generally converge ≈ 10 ∼ 100 times slower than the model on any task.
  • Other well-designed models such as GIN [40] are way too different from the approach at a very fundamental level, are not considered as baselines
  • Some other methods such as GEM [22] and SHINE [38] should be capable of handling the dataset at this scale, but they are not releasing their code to the public, and the authors can not guarantee reproduction.
  • The authors did not have to tune the hyper-parameters of TIMME models closely as hard, thanks to its robustness
  • Results:

    Less than 1%. The authors assume that, if by training on relation ri the authors achieve a good performance on relation rj , the authors say relation ri probably leads to rj.
  • Conclusion:

    The TIMME models the authors proposed handles multiple relations, with a multi-relational encoder, and multi-task decoder.
  • The authors step aside the silent-majority problem by relying mostly on the relations, instead of the text information.
  • The authors accept incomplete input features, but the authors showed that links are able to do well on generating the ideology embedding without additional text information.
  • The authors' model could be extended to any other social network embedding problem, such as on any other dataset like Facebook as long as the dataset is legally available, and it works on predicting other tendencies like preferring Superman or Batman.
  • The authors believe that the dataset would be beneficial to the community
表格
  • Table1: Descriptive statistics of the three selected subsets of our dataset
  • Table2: Node classification measured by F1-score/accuracy
  • Table3: Link-prediction measured by ROC-AUC/PR-AUC
Download tables as Excel
相关工作
  • 2.1 Ideology Detection

    Ideology detection in general could be naturally divided into two directions, based on the targets to predict: of the politicians [7, 24, 28], and of the ordinary citizens [1, 2, 5, 8, 13, 15,16,17, 20, 23, 29]. The work conducted on ordinary citizens could also be categorized into two types according to the source of data being used: intentionally collected via strategies like survey [1, 20], and directly collected such as from news articles [2] or from social networks [13, 15, 17]. Some studies take advantages from both sides, asking self-reported responses from a group of users selected from social networks [29], and some researchers admitted the limitations of survey experiments [23]. Emerging from social science, probabilistic models have been widely used for such kinds of analysis since the early 1980s [2, 13, 28]. On the other hand, on social network datasets, it is quite intuitive trying to extract information from text data to do ideology-detection [5, 8, 15,16,17], only a few paid attention to links [9, 13]. Our work differs from them all, since: (1) unlike probabilistic models, we use GNN approaches to solve this problem, so that we take advantage of the high-efficient computational resources, and we have the embeddings for further analysis; (2) we focus on relations among users, and proved how telling those relations are.
基金
  • This work is partially supported by NSF III-1705169, NSF CAREER Award 1741634, NSF #1937599, DARPA HR00112090027, Okawa Foundation Grant, and Amazon Research Award. Weiping Song is supported by National Key Research and Development Program of China with Grant No 2018AAA0101900/ 2018AAA0101902 as well as the National Natural Science Foundation of China (NSFC Grant No 61772039 and No 91646202). At the early stage of this work, Haoran Wang 8 contributed a lot to a nicely-implemented first version of the model, benefiting the rest of our work
引用论文
  • Christopher H Achen. 1975. Mass political attitudes and the survey response. American Political Science Review 69, 4 (1975), 1218–1231.
    Google ScholarLocate open access versionFindings
  • Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, and Preslav Nakov. 2019. Multi-task ordinal regression for jointly predicting the trustworthiness and the leading political ideology of news media. arXiv preprint arXiv:1904.00542 (2019).
    Findings
  • Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1358–1368.
    Google ScholarLocate open access versionFindings
  • Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018).
    Findings
  • Wei Chen, Xiao Zhang, Tengjiao Wang, Bishan Yang, and Yi Li. 2017. Opinionaware Knowledge Graph for Political Ideology Detection.. In IJCAI. 3647–3653.
    Google ScholarFindings
  • Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 257–266.
    Google ScholarLocate open access versionFindings
  • Joshua Clinton, Simon Jackman, and Douglas Rivers. 2004. The statistical analysis of roll call data. American Political Science Review 98, 2 (2004), 355–370.
    Google ScholarLocate open access versionFindings
  • Michael D Conover, Bruno Gonçalves, Jacob Ratkiewicz, Alessandro Flammini, and Filippo Menczer. 2011. Predicting the political alignment of twitter users. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing. IEEE, 192–199.
    Google ScholarLocate open access versionFindings
  • Michael D Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Filippo Menczer, and Alessandro Flammini. 2011. Political polarization on twitter. In Fifth international AAAI conference on weblogs and social media.
    Google ScholarLocate open access versionFindings
  • Aron Culotta, Nirmal Ravi Kumar, and Jennifer Cutler. 2015. Predicting the Demographics of Twitter Users from Website Traffic Data.. In AAAI, Vol. 15.
    Google ScholarLocate open access versionFindings
  • Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844–3852.
    Google ScholarFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
    Google ScholarLocate open access versionFindings
  • Yupeng Gu, Ting Chen, Yizhou Sun, and Bingyu Wang. 2016. Ideology detection for twitter users with heterogeneous types of links. arXiv preprint arXiv:1612.08207 (2016).
    Findings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024–1034.
    Google ScholarFindings
  • Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. 2014. Political ideology detection using recursive neural networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1113–1122.
    Google ScholarLocate open access versionFindings
  • Kristen Johnson and Dan Goldwasser. 20Identifying stance by analyzing political discourse on twitter. In Proceedings of the First Workshop on NLP and Computational Social Science. 66–75.
    Google ScholarLocate open access versionFindings
  • Sandeepa Kannangara. 2018. Mining twitter for fine-grained political opinion polarity classification, ideology detection and sarcasm detection. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 751–752.
    Google ScholarLocate open access versionFindings
  • Alex Kendall, Yarin Gal, and Roberto Cipolla. 20Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482–7491.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
    Findings
  • Theresa Kuhn and Aaron Kamm. 2019. The national boundaries of solidarity: a survey experiment on solidarity with unemployed people in the European Union. European Political Science Review 11, 2 (2019), 179–195.
    Google ScholarLocate open access versionFindings
  • Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song. 2018. Heterogeneous graph neural networks for malicious account detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2077–2085.
    Google ScholarLocate open access versionFindings
  • Sergio Martini and Mariano Torcal. 2019. Trust across political conflicts: Evidence from a survey experiment in divided societies. Party Politics 25, 2 (2019), 126–139.
    Google ScholarLocate open access versionFindings
  • Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler. 2015. Tea party in the house: A hierarchical ideal point topic model and its application to republican legislators in the 112th congress. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1438– 1448.
    Google ScholarLocate open access versionFindings
  • Chang Sup Park. 2013. Does Twitter motivate involvement in politics? Tweeting, opinion leadership, and political engagement. Computers in Human Behavior 29, 4 (2013), 1641–1648.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
    Google ScholarLocate open access versionFindings
  • Gary Pollock, Tom Brock, and Mark Ellison. 2015. Populism, ideology and contradiction: mapping young people’s political views. The Sociological Review 63 (2015), 141–166.
    Google ScholarLocate open access versionFindings
  • Keith T Poole and Howard Rosenthal. 1985. A spatial model for legislative roll call analysis. American Journal of Political Science (1985), 357–384.
    Google ScholarLocate open access versionFindings
  • Daniel Preoţiuc-Pietro, Ye Liu, Daniel Hopkins, and Lyle Ungar. 2017. Beyond binary labels: political ideology prediction of twitter users. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 729–740.
    Google ScholarLocate open access versionFindings
  • Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
    Findings
  • Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).
    Findings
  • Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Springer, 593–607.
    Google ScholarFindings
  • Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems. 926–934.
    Google ScholarLocate open access versionFindings
  • Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery 3, 2 (2012), 1–159.
    Google ScholarFindings
  • Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. 1067–1077.
    Google ScholarLocate open access versionFindings
  • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
    Findings
  • Prashanth Vijayaraghavan, Soroush Vosoughi, and Deb Roy. 2017. Twitter demographic classification using deep multi-modal multi-task learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 478–483.
    Google ScholarLocate open access versionFindings
  • Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, and Qi Liu. 2018. Shine: Signed heterogeneous information network embedding for sentiment link prediction. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 592–600.
    Google ScholarLocate open access versionFindings
  • Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous graph attention network. In The World Wide Web Conference. 2022–2032.
    Google ScholarFindings
  • Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
    Findings
  • Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014).
    Findings
  • Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. 2019. Graph Transformer Networks. In Advances in Neural Information Processing Systems. 11960–11970.
    Google ScholarLocate open access versionFindings
  • Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla. 2019. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 793–803. 10 To legally and reliably crawl from Twitter data, we first applied for Developer API from Twitter 11, and then used Tweepy 12 for crawling. We set very strict rate limits for our crawlers so as not to harm any server. Our dataset is released at https://github.com/PatriciaXiao/TIMME. Raw data was collected by April, 2019.
    Locate open access versionFindings
  • 10 https://scrapy.org/11 https://developer.twitter.com/12 https://www.tweepy.org/13Congress members’ name list with party information is publicly available at https://www.congress.gov/members.14Obama and Trump’s cabinet is publicly available at https://obamawhitehouse.archives.gov/administration/cabinet and https://www.whitehouse.gov/the-trumpadministration/the-cabinet/respectively candidates set C = Cr aw − P.∀vi ∈ C, we apply the same window size s=5, 000 and crawled their most recent s followers, s followees. All follower-followee pairs are stored into a database for the convenience of the following steps.
    Locate open access versionFindings
  • We get feature from text, using a user’s tweets posted to generate her/his feature. Although there has been some recent advances in NLP with transformer-based structures, such as BERT and XLNet, 15 https://developer.twitter.com/en/docs/basics/rate-limiting
    Findings
  • Sentence-BERT [30] found that BERT / XLNet embeddings are generally performing worse than GloVe [26] average on sentence-level tasks. Not to mention the computational cost of transformers. We therefore use GloVe-average of the words as features, Wikipedia 2014 + Gigaword 5 (300d) pre-trained version. When we apply the average-GloVe embedding on tweet-level, and want to tell the ideology behind the tweets, we could easily achieve ≈ 72.84% accuracy, using a 2-layers MLP, after only 200 epochs of training.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科