Dom2Vec - Detecting DGA Domains Through Word Embeddings and AI/ML-Driven Lexicographic Analysis

L. Torrealba Aravena,P. Casas, J. Bustos-Jiménez,G. Capdehourat, M. Findrik

2023 19th International Conference on Network and Service Management (CNSM)(2023)

引用 0|浏览0
暂无评分
摘要
The timely identification of DNS queries to Domain Generation Algorithm (DGA) domains plays a critical role in mitigating malware propagation and its potential impact, especially in thwarting coordinated botnet activity. We introduce Dom2Vec, an innovative approach for swiftly detecting DGA-generated domains by leveraging lexicographic features exclusively derived from the observed domain names in DNS queries. Dom2Vec leverages word embeddings to map tokens extracted from domain names into highly expressive representations. These representations are then combined with a reputation-based scoring system for domain names, which utilizes the co-occurrence frequency of n-grams in relation to a list of whitelisted domains. The fusion of domain embeddings, reputation scores, and other meaningful lexicographic features derived from domain names provides robust domain name representations for AI/ML-driven detection of DGAs. Through experimental evaluation on a dataset comprising 25 distinct families of DGA domains, we demonstrate that Dom2Vec significantly outperforms current state-of-the-art approaches for DGA detection and analysis, improving our previous detection system based on reputation scores by at least 30%, for a false-alarm rate below 1%.
更多
查看译文
关键词
DGA Detection,Word2Vec,TF-IDF,n-grams,Lexicographic Analysis,DNS,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要