Learning to detect malicious urls

Learning to detect malicious urls(2011)

引用 130|浏览44
暂无评分
摘要
Malicious Web sites are a cornerstone of Internet criminal activities. They host a variety of unwanted content ranging from spam-advertised products, to phishing sites, to dangerous “drive-by” exploits that infect a visitor's machine with malware. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. The most prominent existing approaches to the malicious URL problem are manually-constructed blacklists, as well as client-side systems that analyze the content or behavior of a Web site as it is visited.The premise of this dissertation is that we should be able to construct a lightweight URL classification system that simultaneously overcomes the challenges that face blacklists (which have manual updates that can quickly become obsolete) and client-side systems (which are difficult to deploy on a large scale because of their high overhead). To this end, our contribution is that we develop a highly effective system for malicious URL detection that (in its final form) leverages large numbers of features and online learning to scalably and adaptively construct an accurate classifier. Because our system exploits large amounts of training data and adapts to day-by-day variations, we are able to classify URLs with up to 99% accuracy.As part of pursuing malicious URL detection, this dissertation addresses issues that arise from the use of online learning for this application. Thus, our further contributions include advances in understanding the role of uncertainty in online learning, as well as the benefits of exploiting feature correlations in high-dimensional applications such as URL classification. Overall, the contributions of this dissertation make significant advances in improving malicious URL detection and understanding the role of online learning in this application.
更多
查看译文
关键词
URL feature,online learning,malicious URLs,dissertation addresses issue,large amount,effective system,large scale,lightweight URL classification system,Internet criminal activity,malicious URL problem,large Web mail provider,client-side system,malicious web sites,malicious URL detection,malicious urls,large number,online classifier,real-time feed,real-time system,URL classification,Malicious Web site
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要