Mitigating Datacenter Incast Congestion Using RTO Randomization.

GLOBECOM(2015)

引用 3|浏览34
暂无评分
摘要
TCP incast congestion happens in many-to-one communication workflow patterns that frequently arise in large-scale datacenter applications such as web search, social networks, and cluster-based storage systems. Incast congestion can severely degrade the performance of applications. This paper studies the effectiveness of randomizing the TCP retransmission timeout (RTO) in mitigating the impact of incast. Our design is based on the observation that under incast, retransmitted packets also get synchronized due to the use of similar RTOs by the senders. Using analysis and experimental evaluation, we show that there exists a tradeoff between the randomization interval (from which the RTO values are picked) and the number of senders involved in incast. Motivated by this insight, we propose three algorithms (TDA, MAA, and FSA) for the dynamic adaptation of the randomization interval that rely on (a) successive timeouts, (b) explicit knowledge of the level of multiplexing, and/or (c) the knowledge of flow sizes (i.e., large interval for long flows and a small interval for short flows), respectively. Our results show that these algorithms improve goodput by 1.5x-11x for up to 64 senders and provide greater improvement for larger number of senders. The proposed algorithms can be readily deployed as they do not require any changes in switches or applications.
更多
查看译文
关键词
datacenter incast congestion,RTO randomization,TCP incast congestion,many-to-one communication workflow patterns,large-scale datacenter applications,TCP retransmission timeout
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要