Defcon: Preventing Overload with Graceful Feature Degradation.
Every day, billions of people depend on Internet services for communication, commerce, and entertainment. Yet planetary-scale data center infrastructures consisting of millions of servers experience unplanned capacity outages and unexpected demand for resources; how can such infrastructures remain reliable in the face of capacity and workload flux? In this paper, we introduce Defcon, a system for improving the availability of large-scale, globally-distributed Internet services using graceful feature degradation. In response to overload conditions, Defcon enables site operators to gradually disable less-critical features in order to reduce resource demand. Defcon presents a common interface to product developers to define feature knobs that represent degradation capabilities. Defcon automatically tests knobs to understand each knob's product- and infrastructure-level trade-offs. At Meta, we have used Defcon to improve global product availability in the face of worldwide demand-surges in addition to large-scale infrastructure failures.更多