Lightweight, high-resolution monitoring for troubleshooting production systems

OSDI(2008)

引用 89|浏览95
暂无评分
摘要
Production systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously collects profiles of low-level OS events (e.g., scheduling, L2 cache misses, CPU utilization, I/O operations, page allocation, locking) at the granularity of executables, procedures and instructions. Chopstix then reconstructs these events offline for analysis. We have used Chopstix to diagnose several elusive problems in a largescale production system, thereby reducing these intermittent problems to reproducible bugs that can be debugged using standard techniques. The key to Chopstix is an approximate data collection strategy that incurs very low overhead. An evaluation shows Chopstix requires under 1% of the CPU, under 256KB of RAM, and under 16MB of disk space per day to collect a rich set of system-wide data.
更多
查看译文
关键词
system-wide data,approximate data collection strategy,elusive problem,largescale production system,o operation,cpu utilization,disk space,troubleshooting production system,high-resolution monitoring,intermittent problem,l2 cache,production system,high resolution,data collection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要