What Can N-Grams Learn For Malware Detection?

Richard Zak,Edward Raff,Charles Nicholas

PROCEEDINGS OF THE 2017 12TH INTERNATIONAL CONFERENCE ON MALICIOUS AND UNWANTED SOFTWARE (MALWARE)（2017）

引用 45|浏览18

暂无评分

摘要

Recent work has shown that byte n-grams learn mostly low entropy features, such as function imports and strings, which has brought into question whether byte n-grams can learn information corresponding to higher entropy levels, such as binary code. We investigate that hypothesis in this work by performing byte n-gram analysis on only specific sub-sections of the binary file, and compare to results obtained by n-gram analysis on assembly code generated from disassembled binaries. We do this by leveraging the change in model performance and ensembles to glean insights about the data. In doing so we discover that byte n-grams can learn from the code regions, but do not necessarily learn any new information. We also discover that assembly n-grams may not be as effective as previously thought and that disambiguating instructions by their binary opcode, an approach not previously used for malware detection, is critical for model generalization.

查看译文

关键词

malware detection,byte n-grams,low entropy features,strings,binary code,byte n-gram analysis,binary file,assembly code,disassembled binaries,assembly n-grams,entropy levels

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要