On the exploitation of loop-level parallelism in embedded applications

ACM Trans. Embedded Comput. Syst.(2009)

引用 14|浏览12
暂无评分
摘要
Advances in the silicon technology have enabled increasing support for hardware parallelism in embedded processors. Vector units, multiple processors/cores, multithreading, special-purpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address the increasing performance requirements of modern embedded applications. To what extent the available hardware parallelism can be exploited is directly dependent on the amount of parallelism inherent in the given application and the congruence between the granularity of hardware and application parallelism. This paper discusses how loop-level parallelism in embedded applications can be exploited in hardware and software. Specifically, it evaluates the efficacy of automatic loop parallelization and the performance potential of different types of parallelism, viz., true thread-level parallelism (TLP), speculative thread-level parallelism and vector parallelism, when executing loops. Additionally, it discusses the interaction between parallelization and vectorization. Applications from both the industry-standard EEMBC®,1 1.1, EEMBC 2.0 and the academic MiBench embedded benchmark suites are analyzed using the Intel®2 C compiler. The results show the performance that can be achieved today on real hardware and using a production compiler, provide upper bounds on the performance potential of the different types of thread-level parallelism, and point out a number of issues that need to be addressed to improve performance. The latter include parallelization of libraries such as libc and design of parallel algorithms to allow maximal exploitation of parallelism. The results also point to the need for developing new benchmark suites more suitable to parallel compilation and execution. 1 Other names and brands may be claimed as the property of others. 2 Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
更多
查看译文
关键词
vectorization,libraries,speculative thread-level parallelism,performance potential,application parallelism,system-on-chip soc,true thread-level parallelism,multi-cores,loop-level parallelism,available hardware parallelism,vector parallelism,different type,parallel loops,programming models,hardware parallelism,multithreading,thread-level parallelism,thread-level speculation,embedded application,software specification,parallel algorithm,system on chip,embedded processor,programming model,upper bound,thread level parallelism,thread level speculation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络