Based on Multi-features and Clustering Ensemble Method for Automatic Malware Categorization

2017 IEEE Trustcom/BigDataSE/ICESS(2017)

引用 23|浏览25
暂无评分
摘要
Automatic malware categorization plays an important role in combating the current large volume of malware and aiding the corresponding forensics. Generally, there are lot of sample information could be extracted with the static tools and dynamic sandbox for malware analysis. Combine these obtained features effectively for further analysis would provides us a better understanding. On the other hand, most current works on malware analysis are based on single category of machine learning algorithm to categorize samples. However, different clustering algorithms have their own strengths and weaknesses. And then, how to combine the merits of the multiple categories of features and algorithms to further improve the analysis result is very critical. In this paper, we propose a novel scalable malware analysis framework to exploit the complementary nature of different features and algorithms to optimally integrate their results. By using the concept of clustering ensemble, our system combines partitions from individual category of feature and algorithm to obtain better quality and robustness. Our system composed of the following three parts: (1) extract multiple categories of static and dynamic features; (2) use the k-means and hierarchical clustering algorithms to construct the base clustering; (3) proposed an efficient method based on mixture model clustering ensemble to conduct an effective clustering analysis. We have evaluated our method on two malware datasets, namely the Microsoft malware dataset and our own malware dataset which contained 10868 and 53760 samples respectively. Our experiment results show that our method could categorize malware with better quality and robustness. Also, our method has certain advantages in the system run time and memory consumption compared with the state-of-the art malware analysis works
更多
查看译文
关键词
multifeatures ensemble,clustering ensemble,automatic malware categorization,forensics,static tools,dynamic sandbox,Microsoft malware dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要