Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators

IEEE Computer Architecture Letters(2021)

引用 0|浏览12
暂无评分
摘要
Modern data centers run web-scale applications on tens of thousands of servers, generating tens of thousands of Remote Procedure Calls (RPCs) to backend services for each incoming user request. Tail latency, due to a small fraction of randomly slow RPCs, decreases the performance of these incoming requests, degrades users' quality of experience, and limits disaggregation (applications' ability to scale across a data center). We argue that current approaches to improve tail latency (especially, those bounding computation time) are insufficient, even with (reconfigurable-) hardware accelerators. Instead, to chop off the tail, datacenter services should dynamically trade correctness (or result quality) for timeliness, providing bounded latency with near-ideal accuracy. In this paper, we discuss how the increasing prevalence of machine learning (including search techniques like approximate nearest neighbor and PageRank), perceptual algorithms (like computational photography and image/video caching), and natural language processing lets modern hardware accelerators make these dynamic correctness tradeoffs while improving users' quality of experience.
更多
查看译文
关键词
Data centers,hardware accelerators,tail latency,non-determinism,disaggregation,SLO
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要