Noisy neighbor detection using skydive.

SYSTOR '19: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE(2019)

引用 0|浏览21
暂无评分
摘要
Cloud computing technology enables uniform access to shared pools of configurable system resources and higher-level services, rapidly provisioned with minimal management effort. Cloud computing relies on sharing the resources to achieve coherence and economies of scale, through virtualizion. Cloud network, in particular, is virtualized through multiple logical constructs and SW layers, making cloud connectivity complex to configure, debug, and visualize. In this work, we show how to detect cloud network operational issues through monitoring and analytics, using and enhancing open source network analyzer, Skydive [2]. In particular, we focus on Noisy Neighbor Effect, a situation in which a common resource is monopolized by a noisy tenant, resulting in performance degradation experienced by other tenants. Skydive is an open-source network topology and protocol analyzer, capable of discovering and visualizing cloud network topology across its multiple layers, as well as capturing network traffic at programmable granularity, injecting network traffic, and more. Typical Skydive setup consists of multiple Skydive agents installed on various network components and one or more Skydive analyzers deployed on any compute resource in the cloud. Skydive agents discover and report the information to a Skydive analyzer, that stores it over time so it can be consumed via Web UI, command line tools, and REST API, for visualization, exploration, and analytics. In our work we used Skydive to investigate and detect the Noisy Neighbor Effect in Kubernetes (k8s) network. Our setup consisted of a commercial cloud platform, IBM Cloud Private (ICP) [1], running an HTTP server and two HTTP clients constantly sending requests to the server, all 3 are containerized Python applications as shown in Figure 1. We have installed Skydive agents on all the k8s worker nodes. To achieve our goal of detecting anomalous client behavior and creating a visual indication of such anomaly in Skydive UI, we have enhanced Skydive capabilities and contributed our enhancements back to the project by extending the Python REST client library to support traffic injections, and fixing existing bugs in the Skydive system. We used those enhancements to measure Round Trip Time (RTT) between nodes in the cloud network, detect anomalies in RTT measurements and indicate them in Skydive UI, such as the green indication in Figure 1. In this work, we have made the first step towards automatic detection of Noisy Neighbor with Skydive, using simple threshold based approach, in an experimental setup. This work can be extended in a multiple ways - support more generic and realistic multi-tenant setup; employ deeper analyses, e.g. ML and DL, also on historical data; explore additional anomalous cases, beyond the Noisy Neighbor Effect.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要