Characterizing the Internet Host Population Using Deep Learning: A Universal and Lightweight Numerical Embedding.

IMC（2018）

引用 20|浏览64

暂无评分

摘要

In this paper, we present a framework to characterize Internet hosts using deep learning, using Internet scan data to produce numerical and lightweight (low-dimensional) representations of hosts. To do so we first develop a novel method for extracting binary tags from structured texts, the format of the scan data. We then use a variational autoencoder, an unsupervised neural network model, to construct low-dimensional embeddings of our high-dimensional binary representations. We show that these lightweight embeddings retain most of the information in our binary representations, while drastically reducing memory and computational requirements for large-scale analysis. These embeddings are also universal, in that the process used to generate them is unsupervised and does not rely on specific applications. This universality makes the embeddings broadly applicable to a variety of learning tasks whereby they can be used as input features. We present two such examples, (1) detecting and predicting malicious hosts, and (2) unmasking hidden host attributes, and compare the trained models in their performance, speed, robustness, and interpretability. We show that our embeddings can achieve high accuracy (>95%) for these learning tasks, while being fast enough to enable host-level analysis at scale.

查看译文

关键词

Host Embedding, Machine Learning, Network Measurement

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要