BIRD: A Lightweight and Adaptive Compressor for Communication-Efficient Distributed Learning Using Tensor-wise Bi-Random Sampling

Donglei Wu,Weihao Yang,Cai Deng,Xiangyu Zou,Shiyi Li,Wen Xia

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD（2023）

Cited 0|Views4

No score

Abstract

Top-K sparsification-based compression framework is widely employed to reduce communication costs in distributed learning. However, we have identified several issues with existing Top-K sparsification-based compression methods that severely impede their deployment in resource-constrained devices: (i) the limited compressibility of the Top-K parameter's indexes, which critically restricts the overall communication compression ratio; (ii) several time-consuming compression operations significantly negate the benefits of communication compression; (iii) the high memory footprint consumption associated with error feedback techniques used to maintain model quality. To address these issues, we propose a lightweight tensorwise Bi-Random sampling strategy with expectation invariance property called BIRD, which achieves higher compression ratios at lower computational overheads while maintaining a comparable model quality without additional memory costs. Specifically, BIRD applies a tensor-wise index sharing mechanism that substantially reduces the proportion of the index by allowing multiple tensor elements to share a single index, thus improving the overall compression ratio. Additionally, BIRD replaces the time-consuming Top-K sorting with a faster Bi-Random sampling strategy based on the aforementioned index sharing mechanism, thereby reducing the computational costs of compression; Moreover, BIRD establishes an expectation invariance property into the above Bi-Random sampling to ensure an unbiased representation for the L-1-norm of the sampled tensors, effectively maintaining the model quality without incurring extra memory costs. Experiments on multiple mainstream machine learning (ML) tasks demonstrate that compared to state-of-the-art methods, our proposed BIRD achieves 1.3x-31.1x higher compression ratio at lower time overheads with O(N) complexity while maintaining the model quality without incurring extra memory costs.

Translated text

Key words

Distributed learning,Communication compression,Random Sampling,Neural network

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined