Multi-labeling of complex, multi-behavioral malware samples

Computers & Security(2022)

引用 0|浏览9
暂无评分
摘要
The use of malware samples is usually required to test cyber security solutions. For that, the correct typology of the samples is of interest to properly estimate the exhibited performance of the tools under evaluation. Although several malware datasets are publicly available at present, most of them are not labeled or, if so, only one class or tag is assigned to each malware sample. We defend that just one label is not enough to represent the usual complex behavior exhibited by most of current malware. With this hypothesis in mind, and based on the varied classification generally provided by automatic detection engines per sample, we introduce here a simple multi-labeling approach to automatically tag the usual multiple behavior of malware samples. In the paper, we first analyze the coherence between the behaviors exhibited by a specific number of well-known malware samples dissected in the literature and the multiple tags provided for them by our labeling proposal. After that, the automatic multi-labeling scheme is executed over four public Android malware datasets, the different results and statistics obtained regarding their composition and representativeness being discussed. We share in a GitHub repository the multi-labeling tool developed, for public usage.
更多
查看译文
关键词
Android,Behavior,Dataset,Labeling,Malware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要