RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference

Dake Chen,Yuke Zhang,Souvik Kundu,Chenghao Li,Peter A. Beerel

2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD（2023）

引用 0|浏览2

暂无评分

摘要

The concern over data and model privacy in machine learning inference as a service (MLaaS) has led to the development of private inference (PI) techniques. However, existing PI frameworks, especially those designed for large models such as vision transformers (ViT), suffer from high computational and communication overheads caused by the expensive multi-party computation (MPC) protocols. The encrypted attention module that involves the softmax operation contributes significantly to this overhead. In this work, we present a family of models dubbed RNA-ViT, that leverage a novel attention module called reduced-dimension approximate normalized attention and a latency efficient GeLU-alternative layer. In particular, RNA-ViT uses two novel techniques to improve PI efficiency in ViTs: a reduced-dimension normalized attention (RNA) architecture and a high order polynomial (HOP) softmax approximation for latency efficient normalization. We also propose a novel metric, accuracy-to-latency ratio (A2L), to evaluate modules in terms of their accuracy and PI latency. Based on this metric, we perform an analysis to identify a nonlinearity module with improved PI efficiency. Our extensive experiments show that RNA-ViT can achieve average 3.53x, 3.54x, 1.66x lower PI latency with an average accuracy improvement of 0.93%, 2.04%, and 2.73% compared to the state-of-the-art scheme MPCViT [1], on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively.

查看译文

关键词

Deep learning,Computer vision,Vision transformer,Private inference,Multi-party computation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要