DTexL: Decoupled Raster Pipeline for Texture Locality

2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)(2022)

引用 1|浏览13
暂无评分
摘要
Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU’s performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead.
更多
查看译文
关键词
GPU,Caches,Graphics,Scheduling,Texture Locality,Low-power
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要