Tackling management problems in large-scale operational networks via statistical learning

Tackling management problems in large-scale operational networks via statistical learning(2010)

引用 23|浏览9
暂无评分
摘要
Internet is becoming the most indispensable component in our daily life. The innovation of new services, e.g., social network applications, blogs and twitter, video sharing, etc., have completely changed the way how people think and behave. As a consequence, staying connected is the primary demand from most of the people. To fulfill such a demand, various operational networks are established, such as DSL networks and cellular networks, to provide customers with Internet access anywhere and anytime. While network service providers are enjoying the business opportunity from such an increase of demand, they also face the challenge that the increase of human expert knowledge cannot meet the management requirements due to the quick expansion of such large-scale networks. An efficient way of managing such a network is crucial for minimizing operational cost, improving customers’ experience and hence reducing churn, i.e., customers quitting the service.In this thesis, we conduct a systematic study of applying advanced statistical machine learning techniques to solve two representative network management problems: traffic classification and troubleshooting in a large operational DSL network, which consists of millions of users and tens of millions of basic devices. We present the design of three statistical machine learning based systems to solve these two problems, while meeting the operational constraints and requirements in such a network.In particular, we design FLOWCLASS, a light-weight flow-level traffic classification system. FLOWCLASS is characterized with a modular architecture, which combines a series of simple linear binary classifiers each of which can be efficiently implemented and trained on vast amounts of flow data in parallel, and integrates them in such a manner that it attains the accuracy of more sophisticated classifiers.To handle the application scenarios when FLOWCLASS is not applicable or not accurate, we propose TAGCLASS. TAGCLASS incorporates a novel set of features – the spatial distribution of traffic classes in the network, which is represented by colored traffic activity graphs, and employs a two-step model. In the first bootstrapping step, traffic is classified based on solely the associated traffic attributes. In the second calibration step, the results from the bootstrapping step are corrected or reinforced based on the spatial relationships of different traffic classes in the colored TAGs.For the troubleshooting problem, we propose NEVERMIND, a proactive solution to troubleshoot DSL customer problems. NEVERMIND contains two main components: ticket predictor and trouble locator. Ticket predictor detects potential problems which may lead to future customer tickets, and trouble locator prioritizes potential problem locations to assist technicians in diagnosing problems. Innovate techniques such as top-N average precision based feature selection and combined hierarchical models are introduced to adapt existing statistical learning techniques to achieve good accuracy with only limited operational resources.We evaluate these three systems using real measurement data collected from a large operational DSL network over a two-year time period. Evaluation results on traffic classification show that FLOWCLASS can achieve 97% accuracy for TCP flows and 99.6% accuracy for UDP flows persistently for a few months and across different geolocations, and can scale up with the traffic rate on 10Gbps links. TAGCLASS can further reduce 50% errors from FLOWCLASS when only basic flow features are available, and can still reduce 15% errors when all features are accessible. Evaluation results on troubleshooting demonstrate that NEVERMIND can successfully predict 8,000 future tickets per week with high accuracy and significantly improve the speed for technicians to locate problems on DSL lines.This thesis serves (1) to validate the potential for statistical machine learning based solutions to management problems in large-scale operational networks; (2) to pinpoint design principles for constructing such solutions; (3) to propose various novel techniques to adapt existing learning techniques to address various operational constraints and requirements in such networks.
更多
查看译文
关键词
large operational DSL network,management problem,associated traffic attribute,trouble locator,evaluation result,bootstrapping step,statistical machine,light-weight flow-level traffic classification,traffic activity graph,ticket predictor,large-scale operational network,different traffic class,statistical learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要