Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking

IEEE Transactions on Software Engineering(2022)

引用 6|浏览68
暂无评分
摘要
Binary code function search has been used as the core basis of various security and software engineering applications, including malware clustering, code clone detection, and vulnerability auditing. However, recognizing logically similar assembly functions remains a challenge. De facto binary code search tools rely on program structure-level information, including control flow and data flow graphs, extracted via standard program analysis techniques or deep neural networks (DNNs). However, DNN-based approaches capture lexical-, control structure-, or data flow-level information of binary code for representation learning, which can be too coarse-grained and does not faithfully denote program functionality. It can also suffer from low robustness to various challenging settings like compiler optimizations and obfuscations. This paper proposes a general solution to enhance the top-k ranked candidates of DNN-based binary code function search. The key idea is to design a low-cost and comprehensive equivalence check, which quickly exposes functionality deviations between the target function and its top-k matched functions. Functions failed this equivalence check can be shaved from the top-k list, whereas functions passing the check can be reconsidered to move ahead on the top-k ranked candidates, in a deliberate way. We design a practical and efficient equivalence check, named BinUSE, using under-constrained symbolic execution (USE). USE, a variant of symbolic execution, improves scalability by launching symbolic execution directly from function entry points and relaxing constraints over function parameters. It alleviates overhead incurred by path explosion and costly constraints. BinUSE is particularly designed to deliver a soundy function-level equivalence check, enhancing DNN-based binary code search by reducing its false alarms with low cost. Our evaluation shows that BinUSE can enable a general and effective enhancement of three DNN-based binary code search tools against challenges introduced by different compilers, optimizations, obfuscations, and architectures.
更多
查看译文
关键词
Reverse engineering,symbolic execution,software similarity,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要