MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models

ICDM, pp. 282-291, 2020.

Cited by: 0|Views24
EI
Weibo:
AQ3 Attack Robust Fake News Detectors: How effective are generated comments in attacking fake news detectors safe-guarded by a robust comments filtering feature?

Abstract:

In recent years, the proliferation of so-called "fake news" has caused much disruptions in society and weakened the news ecosystem. Therefore, to mitigate such problems, researchers have developed state-of-the-art models to auto-detect fake news on social media using sophisticated data science and machine learning techniques. In this wo...More

Code:

Data:

0
Introduction
  • Circulation of fake news, i.e., false or misleading pieces of information, on social media is detrimental to individuals’ knowledge but is creating an erosion of trust in society.
  • Due to the high-stakes of fake news detection in practice, tremendous efforts have been taken to develop fake news detection models that can auto-detect fake news with high accuracies [1], [4], [25], [27].Figure 1 shows an example of a typical news article posted on the social media channels such as Twitter and Facebook.
  • Recent research has shown that users’ engagement on public news channels on which these articles are shared become a critical signal to flag questionable
Highlights
  • Circulation of fake news, i.e., false or misleading pieces of information, on social media is detrimental to individuals’ knowledge but is creating an erosion of trust in society
  • Due to the high-stakes of fake news detection in practice, tremendous efforts have been taken to develop fake news detection models that can auto-detect fake news with high accuracies [1], [4], [25], [27].Figure 1 shows an example of a typical news article posted on the social media channels such as Twitter and Facebook
  • We are faced with two challenges: (i) how to generate adversarial comments that can fool various cutting-edge fake news detectors to predict target class?; and (ii) how to simultaneously generate adversarial comments that are realistic and relevant to the article’s content; In an attempt to solve these challenges, we propose MALCOM, a novel framework that can generate realistic and relevant comments in an end-to-end fashion to attack fake news detection models, that works for both black box and white box attacks
  • AQ3 Attack Robust Fake News Detectors: How effective are generated comments in attacking fake news detectors safe-guarded by a robust comments filtering feature?
  • We examine whether malicious comments generated by MALCOM can be flagged by human, i.e., the Turing Test
  • We recruit only the users with 95% approval rate, randomly swap the choices and discard responses taking less than 30 seconds
Results
  • The authors evaluate the effectiveness of Malcom and try to answer the following analytical questions (AQs): AQ1 Quality, Diversity, and Coherency: How realistic are the generated comments in terms of their writing styles and as well as coherency to the original articles’ contents?

    AQ2 Attack Performance: How effective are generated comments in attacking white box and black box detectors?

    AQ3 Attack Robust Fake News Detectors: How effective are generated comments in attacking fake news detectors safe-guarded by a robust comments filtering feature?

    AQ4 Robustness: How many malicious comments do the authors need and how early can they effectively attack the detectors?

    The authors plan to release all datasets, codes, and parameters used in the experiments.

    A.
  • GOSSIPCOP is a dataset of fake and real news collected from a fact-checking website, GossipCop, whereas PHEME is a dataset of rumors and non-rumors relating to nine different breaking events.
  • These datasets are selected because they include both veracity label and relevant social media discourse content on Twitter.
  • All the experiments are done only on the test set, i.e., the authors evaluate quality and attack performance of generated comments on unseen articles and their ground-truth comments
Conclusion
  • The authors examine whether malicious comments generated by MALCOM can be flagged by human, i.e., the Turing Test.
  • 2) H2: Given a comment, the users can correctly detect if the comment is generated by human.
  • 3) H3: Given a machine-generated and a human-written comment, the users can correctly identify the machinegenerated.
  • The authors need to equip human workers with intensive training to better identify malicious comments
  • This can be labor intensive and costly due to a large amount of comments published everyday
Summary
  • Introduction:

    Circulation of fake news, i.e., false or misleading pieces of information, on social media is detrimental to individuals’ knowledge but is creating an erosion of trust in society.
  • Due to the high-stakes of fake news detection in practice, tremendous efforts have been taken to develop fake news detection models that can auto-detect fake news with high accuracies [1], [4], [25], [27].Figure 1 shows an example of a typical news article posted on the social media channels such as Twitter and Facebook.
  • Recent research has shown that users’ engagement on public news channels on which these articles are shared become a critical signal to flag questionable
  • Results:

    The authors evaluate the effectiveness of Malcom and try to answer the following analytical questions (AQs): AQ1 Quality, Diversity, and Coherency: How realistic are the generated comments in terms of their writing styles and as well as coherency to the original articles’ contents?

    AQ2 Attack Performance: How effective are generated comments in attacking white box and black box detectors?

    AQ3 Attack Robust Fake News Detectors: How effective are generated comments in attacking fake news detectors safe-guarded by a robust comments filtering feature?

    AQ4 Robustness: How many malicious comments do the authors need and how early can they effectively attack the detectors?

    The authors plan to release all datasets, codes, and parameters used in the experiments.

    A.
  • GOSSIPCOP is a dataset of fake and real news collected from a fact-checking website, GossipCop, whereas PHEME is a dataset of rumors and non-rumors relating to nine different breaking events.
  • These datasets are selected because they include both veracity label and relevant social media discourse content on Twitter.
  • All the experiments are done only on the test set, i.e., the authors evaluate quality and attack performance of generated comments on unseen articles and their ground-truth comments
  • Conclusion:

    The authors examine whether malicious comments generated by MALCOM can be flagged by human, i.e., the Turing Test.
  • 2) H2: Given a comment, the users can correctly detect if the comment is generated by human.
  • 3) H3: Given a machine-generated and a human-written comment, the users can correctly identify the machinegenerated.
  • The authors need to equip human workers with intensive training to better identify malicious comments
  • This can be labor intensive and costly due to a large amount of comments published everyday
Tables
  • Table1: Dataset Statistics and Details of Target Classifiers and
  • Table2: Black Box Attack Performance on Different Attack Strategies and Target Classifier Architectures (Atk%)
  • Table3: Comparison among Attack Methods
  • Table4: Quality, Diversity, Coherency and White Box Attack
  • Table5: Results of User-Study on Generation Quality
  • Table6: Examples of Generated Malicious Comment. Spans in purple and italics are retrieved from the train set and carefully crafted. Spans in blue are generated in end-to-end fashion
  • Table7: Ablation Test
Download tables as Excel
Related work
  • A. Fake News Detection Models

    In terms of computation, the majority of works focus on developing machine learning (ML) based solutions to automatically detect fake news. Feature wise, most models use an article’s title, news content, its social responses (e.g., user comments or replies) [27], relationships between subjects and publishers [37] or any combinations of them [25]. Specifically, social responses have been widely adopted and proven to be strong predictive features for the accurate detection of fake news [25], [27]. Architecture wise, most detectors use recurrent neural network (RNN) [25], [27] or convolutional neural network (CNN) [24] to encode either the news content (i.e., article’s content or micro-blog posts) or the sequential dependency among social comments and replies. Other complex architecture includes the use of co-attention layers [30] to model the interactions between an article’s content and its social comments (e.g., dEFEND [27]) and the adoption of variational auto-encoder to generate synthetic social responses to support early fake news detection (e.g., TCNN-URG [24]).
Funding
  • Even if human are employed to filter out suspicious or auto-generated comments with a mean accuracy of 60% (H1), MALCOM can still effectively achieve over 80% Atk% on average with a remaining 40% of the malicious comments (see Sec
Study subjects and analysis
users: 100
We examine whether malicious comments generated by MALCOM can be easily flagged by human, i.e., the Turing Test. We use Amazon Mechanical Turk (AMT) to recruit over 100 users to distinguish comments generated by MALCOM (machine-generated) and human. We examine the following alternative hypothesises using one-tailed statistical testing

unseen and unique articles: 187
For quality assurance, we recruit only the users with 95% approval rate, randomly swap the choices and discard responses taking less than 30 seconds. We test on comments generated for 187 unseen and unique articles in the PHEME dataset’s test set. Table VI shows that we fail to reject the null-hypothesises of both H1, H2 and H3 (p-value > 0.05)

Reference
  • M. Aldwairi and A. Alwahedi, “Detecting fake news in social media networks,” Procedia Computer Science, vol. 141, pp. 215–222, 2018.
    Google ScholarLocate open access versionFindings
  • J. Allen, B. Howland, M. Mobius, D. Rothschild, and D. J. Watts, “Evaluating the fake news problem at the scale of the information ecosystem,” Science Advances, vol. 6, no. 14, p. eaay3539, 2020.
    Google ScholarLocate open access versionFindings
  • D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar et al., “Universal sentence encoder,” arXiv preprint arXiv:1803.11175, 2018.
    Findings
  • L. Cui and S. W. D. Lee, “SAME: Sentiment-Aware Multi-Modal Embedding for Detecting Fake News,” IEEE/ACM ASONAM’19.
    Google ScholarLocate open access versionFindings
  • J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adversarial examples for text classification,” arXiv preprint arXiv:1712.06751, 2017.
    Findings
  • M. Gabielkov, A. Ramachandran, A. Chaintreau, and A. Legout, “Social Clicks: What and Who Gets Read on Twitter?” Jun. 201[Online]. Available: https://hal.inria.fr/hal-01281190
    Findings
  • I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
    Google ScholarFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS’14.
    Google ScholarFindings
  • I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
    Findings
  • B. Guo, H. Wang, Y. Ding, S. Hao, Y. Sun, and Z. Yu, “c-TextGen: Conditional Text Generation for Harmonious Human-Machine Interaction,” arXiv preprint arXiv:1909.03409.
    Findings
  • M. Hindman and V. Barash, “Disinformation, and influence campaigns on twitter,” Knight Foundation: George Washington University, 2018.
    Google ScholarLocate open access versionFindings
  • B. D. Horne, J. Nørregaard, and S. Adali, “Robust fake news detection over time and attack,” ACM TIST’9.
    Google ScholarLocate open access versionFindings
  • Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, “Toward controlled generation of text,” in ICML’17.
    Google ScholarFindings
  • E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016.
    Findings
  • A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard gan,” arXiv preprint arXiv:1807.00734, 2018.
    Findings
  • Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.
    Findings
  • E. Kochkina, M. Liakata, and A. Zubiaga, “All-in-one: Multi-task learning for rumour verification,” in ACL’18.
    Google ScholarFindings
  • A. M. Lamb, A. G. A. P. Goyal, Y. Zhang, S. Zhang, A. C. Courville, and Y. Bengio, “Professor forcing: A new algorithm for training recurrent networks,” in NIPS’16.
    Google ScholarFindings
  • T. Le, K. Shu, M. D. Molina, D. Lee, S. S. Sundar, and H. Liu, “5 sources of clickbaits you should know! using synthetic clickbaits to improve prediction and distinguish between bot-generated and humanwritten headlines,” IEEE/ACM ASONAM’19.
    Google ScholarLocate open access versionFindings
  • J. Li, S. Ji, T. Du, B. Li, and T. Wang, “TextBugger: Generating Adversarial Text Against Real-world Applications,” NDSS’18.
    Google ScholarLocate open access versionFindings
  • N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” IEEE Euro S&P’16.
    Google ScholarLocate open access versionFindings
  • J. W. Pennebaker, R. L. Boyd, K. Jordan, and K. Blackburn, “The development and psychometric properties of liwc2015,” 2015. [Online]. Available: https://eprints.lancs.ac.uk/id/eprint/134191
    Findings
  • D. Pruthi, B. Dhingra, and Z. C. Lipton, “Combating adversarial misspellings with robust word recognition,” arXiv preprint arXiv:1905.11268, 2019.
    Findings
  • F. Qian, C. Gong, K. Sharma, and Y. Liu, “Neural user response generator: Fake news detection with collective user intelligence.” in IJCAI’18.
    Google ScholarFindings
  • N. Ruchansky, S. Seo, and Y. Liu, “CSI: A hybrid deep model for fake news detection,” in CIKM’17.
    Google ScholarFindings
  • A. Santoro, R. Faulkner, D. Raposo, J. Rae, M. Chrzanowski, T. Weber, D. Wierstra, O. Vinyals, R. Pascanu, and T. Lillicrap, “Relational recurrent neural networks,” in NIPS’18.
    Google ScholarFindings
  • K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “dEFEND: Explainable Fake News Detection,” KDD’19.
    Google ScholarLocate open access versionFindings
  • K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media,” arXiv preprint arXiv:1809.01286, 2018.
    Findings
  • K. Shu, S. Wang, T. Le, D. Lee, and H. Liu, “Deep headline generation for clickbait detection,” in ICDM’18.
    Google ScholarFindings
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS’17.
    Google ScholarFindings
  • E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh, “Universal adversarial triggers for nlp,” arXiv preprint arXiv:1908.07125, 2019.
    Findings
  • Z. Wang, H. Liu, J. Tang, S. Yang, G. Y. Huang, and Z. Liu, “Learning multi-level dependencies for robust word recognition,” arXiv preprint arXiv:1911.09789, 2019.
    Findings
  • N. N. Weili Nie and A. Patel, “RelGAN: Relational Generative Adversarial Networks for Text Generation,” in ICLR’19.
    Google ScholarFindings
  • L. Yu, W. Zhang, J. Wang, and Y. Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient,” in AAAI’17.
    Google ScholarFindings
  • D. Yuan, Y. Miao, N. Z. Gong, Z. Yang, Q. Li, D. Song, Q. Wang, and X. Liang, “Detecting fake accounts in online social networks at the time of registrations,” in CCS’19, pp. 1423–1438.
    Google ScholarFindings
  • R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, and Y. Choi, “Defending against neural fake news,” arXiv preprint arXiv:1905.12616, 2019.
    Findings
  • J. Zhang, B. Dong, and P. Yu, “FAKEDETECTOR: Effective fake news detection with deep diffusive neural network,” in ICDE’20.
    Google ScholarFindings
  • Z. Zhou, H. Guan, M. M. Bhat, and J. Hsu, “Fake news detection via NLP is vulnerable to adversarial attacks,” in ICAART’19.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments