Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators

Alexander Rucker,Muhammad Shahbaz,Kunle Olukotun

IEEE Computer Architecture Letters（2021）

引用 0|浏览12

暂无评分

摘要

Modern data centers run web-scale applications on tens of thousands of servers, generating tens of thousands of Remote Procedure Calls (RPCs) to backend services for each incoming user request. Tail latency, due to a small fraction of randomly slow RPCs, decreases the performance of these incoming requests, degrades users' quality of experience, and limits disaggregation (applications' ability to scale across a data center). We argue that current approaches to improve tail latency (especially, those bounding computation time) are insufficient, even with (reconfigurable-) hardware accelerators. Instead, to chop off the tail, datacenter services should dynamically trade correctness (or result quality) for timeliness, providing bounded latency with near-ideal accuracy. In this paper, we discuss how the increasing prevalence of machine learning (including search techniques like approximate nearest neighbor and PageRank), perceptual algorithms (like computational photography and image/video caching), and natural language processing lets modern hardware accelerators make these dynamic correctness tradeoffs while improving users' quality of experience.

查看译文

关键词

Data centers,hardware accelerators,tail latency,non-determinism,disaggregation,SLO

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要