Exploration into Gray Area: Efficient Labeling for Malicious Domain Name Detection

2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)(2019)

引用 4|浏览34
暂无评分
摘要
This paper presents a method to reduce the labeling cost when acquiring training data for a system that detects malicious domain names by supervised machine learning. The conventional system requires large quantities of both benign and malicious domain names to be prepared as training data to obtain a classifier with high classification accuracy. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by only using approximately 2.5% of the training data used by the conventional system. An additional disadvantage of the conventional system is that, if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved.
更多
查看译文
关键词
malicious domain name,data labeling,active learning,ensemble learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要