Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling.
IEEE Computer Architecture Letters(2017)
摘要
Modern GPGPUs support the concurrent execution of thousands of threads to provide an energy-efficient platform. However, the massive multi-threading of GPGPUs incurs serious cache contention, as the cache lines brought by one thread can easily be evicted by other threads in the small shared cache. In this paper, we propose a software-hardware cooperative approach that exploits the spatial locality...
更多查看译文
关键词
Instruction sets,Cache memory,Dispatching,Two dimensional displays,Graphics processing units
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络