Android malware classification using optimum feature selection and ensemble machine learning

Rejwana Islam,Moinul Islam Sayed,Sajal Saha, Mohammad Jamal Hossain, Md Abdul Masud

Internet of Things and Cyber-Physical Systems(2023)

引用 21|浏览5
暂无评分
摘要
The majority of smartphones on the market run on the Android operating system. Security has been a core concern with this platform since it allows users to install apps from unknown sources. With thousands of apps being produced and launched daily, malware detection using Machine Learning (ML) has attracted significant attention compared to traditional detection techniques. Despite academic and commercial efforts, developing an efficient and reliable method for classifying malware remains challenging. As a result, several datasets for malware analysis have been generated and made available during the past ten years. These datasets may contain static features, such as API calls, intents, and permissions, or dynamic features, like logcat errors, shared memory, and system calls. Dynamic analysis is more resilient when it comes to code obfuscation. Though binary classification and multi-classification have been carried out in recent studies, the latter provides valuable insight into the nature of malware. Because each malware variant operates differently, identifying its category might help prevent it. Using the well-known ensemble ML approach called weighted voting, this study performed dynamic feature analysis for multi-classification. Random Forest, K-nearest Neighbors, Multi-Level Perceptrons, Decision Trees, Support Vector Machines, and Logistic Regression are all studied in this ensemble model. We used a recent dataset named CCCS-CIC-AndMal-2020, which contains an extensive collection of Android applications and malware samples. A well-researched data preparation phase followed by weighted voting based on R2 scores of the ML classifiers presents an accuracy of 95.0% even after excluding 60.2% features, outperforming all recent studies.
更多
查看译文
关键词
optimum feature selection,ensemble machine learning,feature selection,classification,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要