Friend or Foe: Strong Consistency vs. Overload in High-Availability Distributed Systems and SDN

2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)(2018)

引用 14|浏览19
暂无评分
摘要
Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network state information is critical to ensure resilience under failures. Distributed consensus based strong consistency algorithms, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a period of time, such as during natural or man-made disasters or popular sporting events. Hence, the strong consistency algorithms used in such networks must also behave gracefully under overload conditions. We show that, in fact, strong consistency algorithms such as Raft may not behave gracefully under overload conditions and can in fact significantly negatively affect SDN control plane availability in these circumstances. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose DynRaft, a dynamic add-on to Raft implementations that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft with the pysyncobj implementation under emulated overload conditions.
更多
查看译文
关键词
software-defined networks, SDN, distributed systems, RAFT, strong consistency, distributed consensus, carrier grade, reliability, availability, performance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要