Traceback: Learning to Identify Website’s Landing URLs via Noisy Web Traces Passively

2020 27th International Conference on Telecommunications (ICT)(2020)

引用 1|浏览23
暂无评分
摘要
The current web has become a platform where different web resources are combined together. These resources span different URLs and often involve malicious and sensitive content or advertisements (ads). Much of the content is dynamically generated. Thus, diagnosing these complex HTTP URLs hosted on which website is a daunting challenge. Although many tracing methods exist, they are typically designed for specific kinds of websites. There is currently no tool for reconstructing a comprehensive view of identifying landing URLs which are requested by users from noisy URLs automatically fired by browsers. In this paper, we propose Traceback, a tracing framework that provides such a comprehensive view. We build per-user and multi-user chains from passively collecting traffic. Then we extract novel statistical features from graph structures, HTTP states, and semantics. We demonstrate that our methodology is very effective in accurately identifying landing URLs, with recall and precision values up to 95% and over 94% by cross-validation experiments on Random Forest in a real local area environment.
更多
查看译文
关键词
landing URLs,passive traffic,web trace,graph reduction,graph reconstruction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要