Robust and Probabilistic Failure-Aware Placement
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018.
Motivated by the growing complexity and heterogeneity of modern data centers, and the prevalence of commodity component failures, this article studies the failure-aware placement problem of placing tasks of a parallel job on machines in the data center with the goal of increasing availability. We consider two models of failures: adversari...More
Full Text (Upload PDF)
PPT (Upload PPT)