# Gradient Statistics Aware Power Control for Over-the-Air Federated Learning in Fading Channels

ICC Workshops, pp. 1-6, 2020.

EI

Weibo:

Abstract:

To enable communication-efficient federated learning, fast model aggregation can be designed using over-the-air computation (AirComp). In order to implement a reliable and high-performance AirComp over fading channels, power control at edge devices is crucial. Existing works focus on the traditional data aggregation which often assumes ...More

Code:

Data:

Introduction

- The proliferation of mobile devices such as smartphones, tablets, and wearable devices has revolutionized people’s daily lives.
- It is increasingly desired to let edge devices engage in the learning process by keeping the collected data locally and performing training/inference either collaboratively or individually.
- This emerging technology is known as Edge Machine Learning [2] or Edge Intelligence [3].

Highlights

- The proliferation of mobile devices such as smartphones, tablets, and wearable devices has revolutionized people’s daily lives
- Optimal power control in special cases: In the special case where the gradient squared multivariate coefficient of variation approaches infinity, which could happen when the model training converges and/or the dataset is highly non-identically distributed, we show that there is an optimal threshold for the aggregation capabilities of the devices, below which the devices transmit with full power and above which the devices transmit at the power to equalize the weights of their gradients for aggregation to one
- We provide experimental results to validate the performance of the proposed power control for air computation-based Federated learning over fading channels
- This work studied the power control optimization problem for the over-the-air federated learning over fading channels by taking the gradient statistics into account
- It is shown that the optimal transmit power on each device decreases with gradient squared multivariate coefficient of variation and increases with noise variance
- We propose an adaptive power control algorithm that dynamically adjusts the transmit power in each iteration based on the estimation results

Results

- The authors provide experimental results to validate the performance of the proposed power control for AirComp-based FL over fading channels.

A.

Conclusion

- This work studied the power control optimization problem for the over-the-air federated learning over fading channels by taking the gradient statistics into account.
- Number of Devices, K control policy is derived in closed form when the first- and second-order gradient statistics are known.
- In the special cases where β approaches infinity and zero, the optimal transmit power reduces to threshold-based power control in [17] and full power transmission, respectively.
- The authors propose an adaptive power control algorithm that dynamically adjusts the transmit power in each iteration based on the estimation results.
- Experimental results show that the proposed adaptive power control scheme outperforms the existing schemes

Summary

## Introduction:

The proliferation of mobile devices such as smartphones, tablets, and wearable devices has revolutionized people’s daily lives.- It is increasingly desired to let edge devices engage in the learning process by keeping the collected data locally and performing training/inference either collaboratively or individually.
- This emerging technology is known as Edge Machine Learning [2] or Edge Intelligence [3].
## Results:

The authors provide experimental results to validate the performance of the proposed power control for AirComp-based FL over fading channels.

A.## Conclusion:

This work studied the power control optimization problem for the over-the-air federated learning over fading channels by taking the gradient statistics into account.- Number of Devices, K control policy is derived in closed form when the first- and second-order gradient statistics are known.
- In the special cases where β approaches infinity and zero, the optimal transmit power reduces to threshold-based power control in [17] and full power transmission, respectively.
- The authors propose an adaptive power control algorithm that dynamically adjusts the transmit power in each iteration based on the estimation results.
- Experimental results show that the proposed adaptive power control scheme outperforms the existing schemes

Reference

- N. Zhang and M. Tao, “Gradient statistics aware power control for over-the-air federated learning in fading channels,” in Proc. IEEE ICC Workshops, 2020, pp. 1–6.
- J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless network intelligence at the edge,” Proceedings of the IEEE, vol. 107, no. 11, pp. 2204–2239, 2019.
- Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, 2019.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics, 2017, pp. 1273–1282.
- K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konecny, S. Mazzocchi, H. B. McMahan et al., “Towards federated learning at scale: System design,” arXiv preprint arXiv:1902.01046, 2019.
- Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 10, no. 2, p. 12, 2019.
- D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “Qsgd: Communication-efficient sgd via randomized quantization and encoding,” Advances in Neural Information Processing Systems 30, vol. 3, pp. 1710–1721, 2018.
- F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, “1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
- A. F. Aji and K. Heafield, “Sparse communication for distributed gradient descent,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 440–445.
- Y. Tsuzuku, H. Imachi, and T. Akiba, “Variance-based gradient compression for efficient distributed deep learning,” arXiv preprint arXiv:1802.06058, 2018.
- S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge computing systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1205–1221, 2019.
- B. Nazer and M. Gastpar, “Computation over multiple-access channels,” IEEE Trans. Inf. Theory, vol. 53, no. 10, pp. 3498–3516, 2007.
- K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, March 2020.
- G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 491–506, Jan. 2020.
- M. M. Amiri and D. Gunduz, “Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air,” in Proc. IEEE ISIT, July 2019, pp. 1432–1436.
- ——, “Federated learning over wireless fading channels,” IEEE Trans. Wireless Commun., pp. 1–1, 2020.
- X. Cao, G. Zhu, J. Xu, and K. Huang, “Optimal power control for over-the-air computation,” in Proc. IEEE GLOBECOM, Dec. 2019, pp. 1–6.

Full Text

Tags

Comments