A 280 mV-to-1.1 V 256b Reconfigurable SIMD Vector Permutation Engine With 2-Dimensional Shuffle in 22 nm Tri-Gate CMOS

Solid-State Circuits, IEEE Journal of(2012)

引用 28|浏览45
暂无评分
摘要
An ultra-low voltage reconfigurable 4-way to 32-way SIMD vector permutation engine is fabricated in 22 nm tri-gate bulk CMOS, consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clock-less static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file VMIN by 250 mV across PVT variations with a wide dynamic operating range of 280 mV-1.1 V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates, and ultra-low voltage split-output (ULVS) level shifters improving logic VMIN by 150 mV, while enabling peak energy efficiency of 585 GOPS/W measured at 260 mV, 50 °C. The permutation engine achieves: (i) nominal register file performance of 1.8 GHz, 106 mW measured at 0.9 V, 50 °C, (ii) robust register file functionality measured down to 280 mV with peak energy efficiency of 154 GOPS/W, (iii) scalable permute crossbar performance of 2.9 GHz, 69 mW measured at 1.1 V, 50 °C with sub-threshold operation at 240 mV, 10 MHz consuming 19 μW, and (iv) a 64b 4 × 4 matrix transpose algorithm and AoS to SoA conversion with 40%-53% energy savings and 25%-42% improved peak throughput measured at 1.8 GHz, 0.9 V.
更多
查看译文
关键词
CMOS integrated circuits,flip-flops,low-power electronics,parallel processing,2-dimensional shuffle,DETG,P/N dual-ended transmission gate,ULVS level shifter,clock-less static reads,frequency 1.8 GHz,frequency 10 MHz,frequency 2.9 GHz,interleaved folded byte-wise multiplexer layout,peak energy efficiency,power 106 mW,power 19 muW,power 69 mW,register file,scalable permute crossbar,shared gates,size 22 nm,stacked min-delay buffer,temperature 50 C,trigate bulk CMOS,ultra-low voltage reconfigurable SIMD vector permutation engine,ultra-low voltage split-output level shifter,vector flip-flops,voltage 0.9 V,voltage 240 mV,voltage 260 mV,voltage 280 mV to 1.1 V,${rm V}_{rm MIN}$,Single instruction multiple data (SIMD),crossbar,flip-flop,level shifter,near-threshold voltage (NTV),permutation,register file,ultra-low voltage,vector processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要