Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems

2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)(2020)

引用 8|浏览15
暂无评分
摘要
Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can offer tremendous peak throughput on highly parallel programs while being complexity and energy efficient. Manycore processors can be combined with a few high-performance big cores for executing operating systems, legacy code, and serial regions. These systems use heterogeneous cache coherence (HCC) with hardware-based cache coherence between big cores and software-centric cache coherence between tiny cores. Unfortunately, programming these heterogeneous cache-coherent systems to enable collaborative execution is challenging, especially when considering dynamic task parallelism. This paper seeks to address this challenge using a combination of light-weight software and hardware techniques. We provide a detailed description of how to implement a work-stealing runtime to enable dynamic task parallelism on heterogeneous cache-coherent systems. We also propose direct task stealing (DTS), a new technique based on user-level interrupts to bypass the memory system and thus improve the performance and energy efficiency of work stealing. Our results demonstrate that executing dynamic task-parallel applications on a 64-core system (4 big, 60 tiny) with complexity-effective HCC and DTS can achieve: $7 \times$ speedup over a single big core; $1.4 \times$ speedup over an area-equivalent eight bigcore system with hardware-based cache coherence; and 21% better performance and similar energy efficiency compared to a 64-core system (4 big, 60 tiny) with full-system hardware-based cache coherence.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要