Optimizing Sorting with Machine Learning Algorithms

IPDPS(2007)

引用 15|浏览28
暂无评分
摘要
The growing complexity of modern processors has made the development of highly efficient code increasingly diffi- cult. Manually developing highly efficient code is usually expensive but necessary due to the limitations of today's compilers. A promising automatic code generation strategy, implemented by library generators such as ATLAS, FFTW, and SPIRAL, relies on empirical search to identify, for each target machine, the code characteristics, such as the tile size and instruction schedules, that deliver the best perfor- mance. This approach has mainly been applied to scientific codes which can be optimized by identifying code charac- teristics that depend only on the target machine. In this paper, we study the generation of sorting routines whose performance also depends on the characteristics of the in- put data. We present two approaches to generate efficient sorting routines. First, we consider the problem of selecting the best "pure" sorting algorithm as a function of the charac- teristics of the input data. We show that the relative perfor- mance of "pure" sorting algorithms can be encoded as a function of the entropy of the input data set. We used ma- chine learning algorithms to compute a function for each target machine that, at runtime, is used to select the best algorithm. Our second approach generalizes the first ap- proach and can build new sorting algorithms from a few primitive operations. We use genetic algorithms and a clas- sifier system to build hierarchically-organized hybrid sort- ing algorithms capable of adapting to the input data. Our results show that the algorithms generated using this sec- ond approach are quite effective and perform significantly better than the many conventional sorting implementations we tested. In particular, the routines generated using the second approach perform better than the most popular li- braries available today: IBM ESSL, INTEL MKL and the C++ STL. The best algorithm we have been able to gen- erate is on the average 26% and 62% faster than the IBM ESSL in an IBM Power 3 and IBM Power 4, respectively.
更多
查看译文
关键词
power generation,spirals,learning artificial intelligence,genetic algorithms,genetic algorithm,sorting algorithm,sorting,instruction scheduling,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要