Flow Event Telemetry on Programmable Data Plane

SIGCOMM '20: Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication Virtual Event USA August, 2020(2020)

引用 107|浏览539
暂无评分
摘要
Network performance anomalies (NPAs), e.g. long-tailed latency, bandwidth decline, etc., are increasingly crucial to cloud providers as applications are getting more sensitive to performance. The fundamental difficulty to quickly mitigate NPAs lies in the limitations of state-of-the-art network monitoring solutions --- coarse-grained counters, active probing, or packet telemetry either cannot provide enough insights on flows or incur too much overhead. This paper presents NetSeer, a flow event telemetry (FET) monitor which aims to discover and record all performance-critical data plane events, e.g. packet drops, congestion, path change, and packet pause. NetSeer is efficiently realized on the programmable data plane. It has a high coverage on flow events including inter-switch packet drop/corruption which is critical but also challenging to retrieve the original flow information, with novel intra- and inter-switch event detection algorithms running on data plane; NetSeer also achieves high scalability and accuracy with innovative designs of event aggregation, information compression, and message batching that mainly run on data plane, using switch CPU as complement. NetSeer has been implemented on commodity programmable switches and NICs. With real case studies and extensive experiments, we show NetSeer can reduce NPA mitigation time by 61%-99% with only 0.01% overhead of monitoring traffic.
更多
查看译文
关键词
Flow event telemetry, programmable data plane, monitoring
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要