AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We present a transformer approach targeting SemEval 2018 shared task on Semantic Extraction from CybersecUrity REports using Natural Language Processing

Devising Malware Characterstics using Transformers

international journal of engineering trends and technology, no. 5 (2020): 33-37

Cited by: 0|Views4
Full Text
Bibtex
Weibo

Abstract

With the increasing number of cybersecurity threats, it becomes more difficult for researchers to skim through the security reports for malware analysis. There is a need to be able to extract highly relevant sentences without having to read through the entire malware reports. In this paper, we are finding relevant malware behavior menti...More

Code:

Data:

0
Introduction
  • The digital landscape is unique and constantly changing, creating room for cyber-attacks.
  • By 2019, there was a 13% rise in pre-installed malware and adware on Android devices, and what’s even more shocking, the Macs, which are known for its durable security barriers, had more threats detected than Windows.
  • These attacks play with one's data, money and privacy.
  • This urges the large scale companies and customers to effectively know where they are at alerting, blocking and detecting threats
Highlights
  • The digital landscape is unique and constantly changing, creating room for cyber-attacks
  • We present a transformer approach targeting SemEval 2018 shared task on Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP)
  • Our end users can quickly skim through large reports and improve their enterprises’ evasion and prevention strategies in times of adversaries
  • We plan to make use of the large model of BERT which has more number of attention layers in it and is expected to perform better than the base model used in our approach
  • We plan to explore other SubTasks, which revolve on entity extraction and linking with the use of these transformers and make an end-to-end system
Methods
  • The authors aim to discuss in detail the approach to solve subtask 1.
  • The authors further removed texts which were not in English and had only numbers
Results
  • This section gives a detailed overview of the dataset introduced in the SubTask1, Semeval Task 8: SecureNLP Challenge.
  • This section concludes with details of hyperparameters used, metric chosen for evaluation and the results obtained.
  • Documents Train Dev SubTask1 test Total A.
  • The total statistics of the dataset are shown in Table I.
  • The training data for this shared task contains 9,424 sentences, the validation data contains 1,213 sentences, and test data has 618 test sentences.
Conclusion
  • The authors present a transformer approach targeting SemEval 2018 shared task on Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP).
  • The authors' algorithm’s efficacious performance will be fruitful for the security analysts who were the intended end-users.
  • With the help of the research, the authors can accentuate the sentences in the APT reports.
  • The authors' end users can quickly skim through large reports and improve their enterprises’ evasion and prevention strategies in times of adversaries.
  • The authors plan to make use of the large model of BERT which has more number of attention layers in it and is expected to perform better than the base model used in the approach.
  • The authors plan to explore other SubTasks, which revolve on entity extraction and linking with the use of these transformers and make an end-to-end system
Tables
  • Table1: Dataset
  • Table2: Experimental Results on SubTask 1
  • Table3: Hyperparameters of Best Performing Models
Download tables as Excel
Related work
  • In 2018, SemEval organized a shared task called SecureNLP on semantic analysis for cybersecurity texts 1. Task 1 was a binary classification task of sentences extracted from APT reports which had malware behaviour or not. In this section, we briefly describe the approaches by the competition for the task.

    Using Glove embeddings proposed by Pennington [7], Villani [8], outperformed the rest of the competition in Subtask 1 only. With Long Short Term Memory network (LSTM), they generated token representation from the characters. Following that, a binary classifier was trained with Bi-directional Long Short-Term Memory network (BiLSTM).
Funding
  • By 2019, there was a 13% rise in pre-installed malware and adware on Android devices, and what’s even more shocking, the Macs, which are known for its durable security barriers, had more threats detected than Windows
  • In Figure 2, we show the different sampling ratios used against the F1 Score predicted for both oversampling and undersampling models using BERT-cased
Reference
  • Lim, Swee Kiat, et al. “Malwaretextdb: A database for annotated malware articles.” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017.
    Google ScholarLocate open access versionFindings
  • SHARMA, TANU. SOFTWARE BUG LOCALIZATION USING TOPIC MODELS. Diss. 2016.
    Google ScholarFindings
  • Tripathi, Ashish Kumar, Kapil Sharma, and Manju Bala. “Parallel Hybrid BBO Search Method for Twitter Sentiment Analysis of Large Scale Datasets Using MapReduce.” International Journal of Information Security and Privacy (IJISP) 13.3 (2019): 106-122.
    Google ScholarLocate open access versionFindings
  • T. Sharma, K. Sharma and T. Sharma, “Software bug localization using Pachinko Allocation Model,” 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 2016, pp. 3603-3608.
    Google ScholarLocate open access versionFindings
  • JAIN, DEEPAKSHI. CRYPTOCURRENCY PRICE PREDICTION USING TRANSFORMER: A DEEP LEARNING ARCHITECTURE. Diss. 2019.
    Google ScholarFindings
  • Jatana, Nishtha, and Kapil Sharma. “Bayesian spam classification: Time-efficient radix encoded fragmented database approach.” 2014 International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2014.
    Google ScholarLocate open access versionFindings
  • Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. “Glove: Global vectors for word representation.” Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
    Google ScholarLocate open access versionFindings
  • Loyola, Pablo, et al. “Villani at SemEval-2018 Task 8: Semantic Extraction from Cybersecurity Reports using Representation Learning.” Proceedings of The 12th International Workshop on Semantic Evaluation. 2018.
    Google ScholarLocate open access versionFindings
  • Sikdar, Utpal Kumar, Biswanath Barik, and Bjorn Gamback. “Flytxt NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naıve Bayes Classifiers.” Proceedings of The 12th International Workshop on Semantic Evaluation. 2018.
    Google ScholarLocate open access versionFindings
  • Ma, Chunping, et al. “DM NLP at SemEval-2018 Task 8: neural sequence labelling with linguistic features.” Proceedings of The 12th International Workshop on Semantic Evaluation. 2018.
    Google ScholarLocate open access versionFindings
  • Ma, Xuezhe, and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf.” arXiv preprint arXiv:1603.01354 (2016).
    Findings
  • Fu, Mingming, Xuemin Zhao, and Yonghong Yan. “HCCL at SemEval-2018 Task 8: An End-to-End System for Sequence Labeling from Cybersecurity Reports.” Proceedings of The 12th International Workshop on Semantic Evaluation. 2018.
    Google ScholarLocate open access versionFindings
  • Brew, Chris. “Digital Operatives at SemEval-2018 Task 8: Using dependency features for malware NLP.” Proceedings of The 12th International Workshop on Semantic Evaluation. 2018.
    Google ScholarLocate open access versionFindings
  • Crammer, Koby, et al. “Online passive-aggressive algorithms.” Journal of Machine Learning Research 7.Mar (2006): 551- 585.
    Google ScholarLocate open access versionFindings
  • Manikandan, R., Krishna Madgula, and Snehanshu Saha. “TeamDL at SemEval-2018 task 8: Cybersecurity text analysis using convolutional neural network and conditional random fields.” Proceedings of The 12th International Workshop on Semantic Evaluation. 2018.
    Google ScholarLocate open access versionFindings
  • Padia, Ankur, et al. “UMBC at SemEval-2018 Task 8: Understanding text about malware.” Proceedings of International Workshop on Semantic Evaluation (SemEval2018). 2018.
    Google ScholarLocate open access versionFindings
  • Ravikiran, Manikandan, and Krishna Madgula. “Fusing Deep Quick Response Code Representations Improves Malware Text Classification.” Proceedings of the ACM Workshop on Crossmodal Learning and Application. 2019.
    Google ScholarLocate open access versionFindings
  • Howard, Jeremy, and Sebastian Ruder. “Universal language model fine-tuning for text classification.” arXiv preprint arXiv:1801.06146 (2018).
    Findings
  • Mikolov, Tomas, et al. “Distributed representations of words and phrases and their compositionality.” Advances in neural information processing systems. 2013.
    Google ScholarFindings
  • Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).
    Findings
  • Yang, Zhilin, et al. “Xlnet: Generalized autoregressive pretraining for language understanding.” Advances in neural information processing systems. 2019.
    Google ScholarFindings
Author
Shahid Simra
Shahid Simra
Singh Tanmay
Singh Tanmay
Sharma Yash
Sharma Yash
Sharma Kapil
Sharma Kapil
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科