Note that our algorithm shows a substantial improvement with respect to both AVERAGE RATE heuristic and OPTIMAL AVAILABLE heuristic, while maintaining a low competitive ratio even when the prediction error is high
Learning Augmented Energy Minimization via Speed Scaling
NIPS 2020, (2020)
As power management has become a primary concern in modern data centers, computing resources are being scaled dynamically to minimize energy consumption. We initiate the study of a variant of the classic online speed scaling problem, in which machine learning predictions about the future can be integrated naturally. Inspired by recent w...更多
下载 PDF 全文
- Online problems can be informally defined as problems where the authors are required to make irrevocable decisions without knowing the future.
- In which the jobs are revealed only at their release time, Yao et al designed two different algorithms: (1) the AVERAGE RATE heuristic (AVR), for which they proved a bound of 2α−1αα on the competitive ratio.
- The authors prove that no online algorithm can have a competitive ratio better than Ω((6/5)α) even in the uniform case.
- Online problems can be informally defined as problems where we are required to make irrevocable decisions without knowing the future
- Due to the success story of machine learning (ML), a recent line of work, first proposed by Lykouris and Vassilvitskii  and Medina and Vassilvitskii , suggests incorporating the predictions provided by ML algorithms in the design of online algorithms
- An obvious caveat is that ML predictors often come with no worst-case guarantees and so we would like our algorithm to be robust to misleading predictions
- We focus on our main contribution which is the design and analysis of a simple and efficient algorithm which incorporates any ML predictor as a black-box without making any further assumption
- Note that our algorithm shows a substantial improvement with respect to both AVERAGE RATE heuristic (AVR) and OPTIMAL AVAILABLE heuristic (OA), while maintaining a low competitive ratio even when the prediction error is high
- While our work considers a specific problem related to scheduling, we would like to emphasize that a considerable percentage of real-world systems already have the ability to dynamically scale their computing resources2 to minimize their energy consumption
- In the following the authors denote by OPT the energy cost of the optimal offline schedule and by ε > 0 a robustness parameter of the algorithm, the smaller ε is the more the authors trust the prediction.
- It is easy to see that the algorithm is consistent: If the prediction of wireal is perfect, the job will be scheduled at speed ci in the interval [ai, bi].
- For every 0 < δ 1, the cost of the schedule produced by the algorithm LAS-TRUST is bounded by (1 + δ)α OPT +(12/δ)α · err .
- The authors will first relate the energy of the schedule s(t) to the optimal energy for the predicted instance, i.e., OPT(wpred).
- The authors describe a method ROBUSTIFY that takes any online algorithm which guarantees to complete each job in (1 − δ)D time, that is, with some slack to its deadline, and turns it into a robust algorithm without increasing the energy of the schedule produced.
- Input: T , D, and wpred initially and wreal in an online fashion OLeuttδpu>t:0Awfiethasib11l−+eδδscαhe=du1le+(sεi.)Ti=−0D Compute optimal offline schedule for (wpred, T, (1 − δ)D) where the jobs wipred are run at uniform speeds ci an disjoint intervals [ai, bi] using .
- The authors note that in the first case, where the predictor is relatively accurate but still noisy, LAS is consistently better than any online algorithm achieving a competitive ratio close to 1 for small values of ε.
- The predictor tries to mislead the algorithm by creating a prediction which constitutes a symmetric (around (m + M )/2) random walk with respect to the true instance.
- Note that the algorithm shows a substantial improvement with respect to both AVR and OA, while maintaining a low competitive ratio even when the prediction error is high.
- While the work considers a specific problem related to scheduling, the authors would like to emphasize that a considerable percentage of real-world systems already have the ability to dynamically scale their computing resources2 to minimize their energy consumption.
- Table1: Artificial dataset results
- Table2: Real dataset results with different α values
- On the one hand, the field of learning augmented algorithms is relatively new, with a lot of recent exciting results (see for example Gollapudi and Panigrahi , Hsu et al , Kodialam , Lattanzi et al , Lee et al , Lykouris and Vassilvitskii , Medina and Vassilvitskii , Purohit et al , Xu and Xu ). On the other hand, the speed scaling problem proposed by Yao et al in  is well understood in both the offline and online setting. In its full generality, a set of tasks each with different arrival times, deadlines, and workloads needs to be completed in time while the speed is scaled in order to minimize energy. In the offline setting Yao et al proved that the problem can be solved in polynomial time by a greedy algorithm. In the online setting, in which the jobs are revealed only at their release time, Yao et al designed two different algorithms: (1) the AVERAGE RATE heuristic (AVR), for which they proved a bound of 2α−1αα on the competitive ratio. This analysis was later proved to be asymptotically tight by Bansal et al . (2) The OPTIMAL AVAILABLE heuristic (OA), which was shown to be αα-competitive in . In the same paper, Bansal et al proposed a third online algorithm named BKP for which they proved a competitive ratio asymptotically equivalent to eα. While these competitive ratios exponential in α might not seem satisfying, Bansal et al also proved that the exponential dependency cannot be better than eα. A number of variants of the problem have also been considered in the offline setting (no preemption allowed, precedence constraints, nested jobs and more listed in a recent survey by Gerards et al ) and under a stochastic optimization point of view (see for instance ). It is important to note that, while in theory the problem is interesting in the general case i.e. when α is an input parameter, in practice we usually focus on small values of α such as 2 or 3 since they model certain physical laws (see e.g. Bansal et al ). Although the BKP algorithm provides the best asymptotic guarantee, OA or AVR often lead to better solutions for small α and therefore remain relevant.
- Acknowledgments and Disclosure of Funding This research is supported by the Swiss National Science Foundation project 200021-184656 “Randomness in Problem Instances and Randomized Algorithms”
- Andreas Maggiori was supported by the Swiss National Science Fund (SNSF) grant no 200020_182517/1 “Spatial Coupling of Graphical Models in Communications, Signal Processing, Computer Science and Statistical Physics”
- Lachlan LH Andrew, Minghong Lin, and Adam Wierman. Optimality, fairness, and robustness in speed scaling designs. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 37–48, 2010.
- Nikhil Bansal, Tracy Kimbrel, and Kirk Pruhs. Speed scaling to manage energy and temperature. J. ACM, 54(1):3:1–3:39, 2007. doi: 10.1145/1206035.1206038. URL https://doi.org/10.1145/1206035.1206038.
- Nikhil Bansal, David P. Bunde, Ho-Leung Chan, and Kirk Pruhs. Average rate speed scaling. In LATIN 2008: Theoretical Informatics, 8th Latin American Symposium, Búzios, Brazil, April 7-11, 2008, Proceedings, pages 240–251, 2008. doi: 10.1007/978-3-540-78773-0\_21. URL https://doi.org/10.1007/978-3-540-78773-0_21.
- Jeff Barr. New – predictive scaling for ec2, powered by machine learning. AWS News Blog, November 2018. URL https://aws.amazon.com/blogs/aws/new-predictivescaling-for-ec2-powered-by-machine-learning/.
- Eunjoon Cho, Seth A. Myers, and Jure Leskovec. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, page 1082–1090, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450308137. doi: 10.1145/2020408.2020579. URL https://doi.org/10.1145/2020408.2020579.
- Marco E. T. Gerards, Johann L. Hurink, and Philip K. F. Hölzenspies. A survey of offline algorithms for energy minimization under deadline constraints. J. Scheduling, 19(1):3–19, 201CPU Dynamic Voltage and Frequency Scaling (DVFS) in modern processors and autoscaling of cloud applications doi: 10.1007/s10951-015-0463-8. URL https://doi.org/10.1007/s10951-015-04638.
- Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expert advice. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 2319–2327, 2019. URL http://proceedings.mlr.press/v97/gollapudi19a.html.
- Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. Learning-based frequency estimation algorithms. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019. URL https://openreview.net/forum?id=r1lohoCqY7.
- Craig Kitterman. Autoscaling windows azure applications. Microsoft Azure Blog, June 2013. URL https://azure.microsoft.com/de-de/blog/autoscaling-windowsazure-applications/.
- Rohan Kodialam. Optimal algorithms for ski rental with soft machine-learned predictions. CoRR, abs/1903.00092, 2019. URL http://arxiv.org/abs/1903.00092.
- Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online scheduling via learned weights. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1859–1877, 2020. doi: 10.1137/1.9781611975994.114. URL https://doi.org/10.1137/1.9781611975994.114.
- Russell Lee, Mohammad H. Hajiesmaili, and Jian Li. Learning-assisted competitive algorithms for peak-aware energy scheduling. CoRR, abs/1911.07972, 2019. URL http://arxiv.org/abs/1911.07972.
- Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 3302–3311, 2018. URL http://proceedings.mlr.press/v80/lykouris18a.html.
- Andres Muñoz Medina and Sergei Vassilvitskii. Revenue optimization with approximate bid predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 1858–1866, 2017. URL http://papers.nips.cc/paper/6782-revenueoptimization-with-approximate-bid-predictions.
- Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ML predictions. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 9684–9693, 2018. URL http://papers.nips.cc/paper/8174improving-online-algorithms-via-ml-predictions.
- Yinfeng Xu and Weijun Xu. Competitive algorithms for online leasing problem in probabilistic environments. In Advances in Neural Networks - ISNN 2004, International Symposium on Neural Networks, Dalian, China, August 19-21, 2004, Proceedings, Part II, pages 725–730, 2004. doi: 10.1007/978-3-540-28648-6\_1URL https://doi.org/10.1007/978-3540-28648-6_116.
- F. Frances Yao, Alan J. Demers, and Scott Shenker. A scheduling model for reduced CPU energy. In 36th Annual Symposium on Foundations of Computer Science, Milwaukee, Wisconsin, USA, 23-25 October 1995, pages 374–382, 1995. doi: 10.1109/SFCS.1995.492493. URL https://doi.org/10.1109/SFCS.1995.492493.
- 1. If the algorithm processes more that γ units of work on job 1 before time 1 then for instance J1 the energy cost is at least γα. Hence the competitive ratio is at least γα · 2α−1.
- 2. On the contrary, if the algorithm works less than γ units of work before the release of the second job then in instance J2 the algorithm has to complete at least 3 − γ units of work between time 1 and 3.
- 1. If the algorithm works more that 1/2 then the energy spent by the algorithm until time εD
- 2. However, if it works less than 1/2 then on instance J2, a total work of at least (1/ε + 1 − 1/2) = (1/2 + 1/ε) remains to be done in D time units. Hence the energy consumption on instance J2 is at least
- 1. For any k 0, the machine is never idle in interval Ik.
- 2. For any k 0, all jobs that are processed in Ik have a deadline dj.