BIRD: A Lightweight and Adaptive Compressor for Communication-Efficient Distributed Learning Using Tensor-wise Bi-Random Sampling

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD(2023)

Cited 0|Views4
No score
Abstract
Top-K sparsification-based compression framework is widely employed to reduce communication costs in distributed learning. However, we have identified several issues with existing Top-K sparsification-based compression methods that severely impede their deployment in resource-constrained devices: (i) the limited compressibility of the Top-K parameter's indexes, which critically restricts the overall communication compression ratio; (ii) several time-consuming compression operations significantly negate the benefits of communication compression; (iii) the high memory footprint consumption associated with error feedback techniques used to maintain model quality. To address these issues, we propose a lightweight tensorwise Bi-Random sampling strategy with expectation invariance property called BIRD, which achieves higher compression ratios at lower computational overheads while maintaining a comparable model quality without additional memory costs. Specifically, BIRD applies a tensor-wise index sharing mechanism that substantially reduces the proportion of the index by allowing multiple tensor elements to share a single index, thus improving the overall compression ratio. Additionally, BIRD replaces the time-consuming Top-K sorting with a faster Bi-Random sampling strategy based on the aforementioned index sharing mechanism, thereby reducing the computational costs of compression; Moreover, BIRD establishes an expectation invariance property into the above Bi-Random sampling to ensure an unbiased representation for the L-1-norm of the sampled tensors, effectively maintaining the model quality without incurring extra memory costs. Experiments on multiple mainstream machine learning (ML) tasks demonstrate that compared to state-of-the-art methods, our proposed BIRD achieves 1.3x-31.1x higher compression ratio at lower time overheads with O(N) complexity while maintaining the model quality without incurring extra memory costs.
More
Translated text
Key words
Distributed learning,Communication compression,Random Sampling,Neural network
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined