Malware Detection by Control-Flow Graph Level Representation Learning With Graph Isomorphism Network.

IEEE Access(2022)

Cited 3|Views15
No score
With society's increasing reliance on computer systems and network technology, the threat of malicious software grows more and more serious. In the field of information security, malware detection has been a key problem that academia and industry are committed to solving. Machine learning is an effective method for processing large-scale data, such as the Gradient Boosting Decision Tree (GBDT) and deep neural network technology. Although these types of detection methods can deal with cyber threats, most feature extraction methods are based on the statistical information features of portable executable (PE) files and thus lack the decompiled code and execution flow structure of the PE samples. Therefore, we propose a Control-Flow Graph (CFG)- and Graph Isomorphism Network (GIN)-based malware classification system. The feature vectors of CFG basic blocks are generated using the large-scale pre-trained language model MiniLM, which is beneficial for the GIN to further learn and compress the CFG-based representation, and classified with multi-layer perceptron. In addition, we evaluated the effectiveness of the representation under different dimensions and classifiers. To evaluate our method, we set up a CFG-based malware detection graph dataset from a PE file of the Blue Hexagon Open Dataset for Malware Analysis (BODMAS), which we call the Malware Geometric Binary Dataset (MGD-BINARY) and collected the experimental results of CFG representation in different dimensions and classifier settings. The evaluation results show that our proposal has proved an Accuracy metric of 0.99160 and achieved 0.99148 Area Under the Curve (AUC) results.
Translated text
Key words
Malware,Feature extraction,Codes,Data mining,Analytical models,Machine learning,Natural language processing,Graphics,Malware detection,machine learning,static analysis,graph classification
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined