## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Learning to Optimize: Training Deep Neural Networks for Wireless Resource Management.

SPAWC, (2017): 1-6

EI

Keywords

Abstract

For decades, optimization has played a central role in addressing wireless resource management problems such as power control and beamformer design. However, these algorithms often require a considerable number of iterations for convergence, which poses challenges for real-time processing. In this work, we propose a new learning-based app...More

Code:

Data:

Introduction

- Resource management tasks, such as transmit power control, transmit/receive beamformer design, and user admission control, are critical for future wireless networks.
- It remains unclear whether a multi-layer neural network can be used to approximate the behavior of a given iterative algorithm, like WMMSE, for solving the nonconvex optimization problem (1).
- Given a set of training data points {z(i),(i)}, the authors use a simple three layer neural network to approximate the relationship z → xT , which characterizes the behavior of GD.

Highlights

- Resource management tasks, such as transmit power control, transmit/receive beamformer design, and user admission control, are critical for future wireless networks
- We use extensive numerical simulations to demonstrate that deep neural network (DNN) can achieve orders of magnitude speedup in computational time compared to state-of-the-art power allocation algorithms based on optimization
- The proposed DNN approach is implemented in Python 3.6.0 with TensorFlow 1.0.0 on one computer node with two 8-core Intel Haswell processors, two Nvidia K20 Graphical Processing Units (GPUs), and 128 GB of memory
- Rayleigh fading is a reasonable channel model that has been widely used to simulate the performance of various resource allocation algorithms
- We evaluate the sum-rate performance of the DNN-based approach in the testing stage compared to the following schemes: 1) the WMMSE; 2) the random power allocation strategy, which generates the power allocation as: pk ∼ Uniform(0, Pmax), ∀ k; 3) the maximum power allocation: pk = Pmax, ∀ k; The latter two schemes serve as heuristic baselines
- We study the scenario in which only K/2 users are present in the testing, while the DNN is trained with K users

Results

- Many resource allocation algorithms other than WMMSE can be approximated following similar analysis steps as in Theorem 2, as long as they can be expressed as the composition of these ‘basic’ operations such as multiplication, division, binary search, threshloding operations, etc.
- The authors compute the resulting sum-rate of the power allocation generated by DNN and compare it with that obtained by the WMMSE.
- The authors test the robustness of the learned models by generating channels following distributions that are different from the training stage, and evaluate the resulting performance.
- The authors mention that for each network scenario (i.e., IC/IMAC with a different number of BSs/users), the authors randomly generate one million realizations of the channel as the training data and ten thousand realizations of the channel as the validation data and testing data.
- To find parameters for the training algorithm, the authors perform cross-validation for different channel models as follows: DRAFT
- The authors evaluate the sum-rate performance of the DNN-based approach in the testing stage compared to the following schemes: 1) the WMMSE; 2) the random power allocation strategy, which generates the power allocation as: pk ∼ Uniform(0, Pmax), ∀ k; 3) the maximum power allocation: pk = Pmax, ∀ k; The latter two schemes serve as heuristic baselines.
- From Table I and Table II, the authors observe that the DNN performs better in the IMAC setting, achieving lower computational time and higher relative sum-rate.
- The results are shown in TABLE IV and TABLE V, from which the authors can conclude that the model generalizes relatively well when the testing network configurations are sufficiently close to those used in the training.

Conclusion

- Despite the fact that the total available data set is quite limited, the proposed DNN approach can still achieve relatively high sum-rate performance on the measured testing data set.
- The authors' theoretical results indicate that it is possible to learn a well-defined optimization algorithm very well by using finite-sized deep neural networks.
- The authors' empirical results show that, for the power control problems over either the IC or the IMAC channel, deep neural networks can be trained to well-approximate the behavior of the state-of-the-art algorithm WMMSE.

Summary

- Resource management tasks, such as transmit power control, transmit/receive beamformer design, and user admission control, are critical for future wireless networks.
- It remains unclear whether a multi-layer neural network can be used to approximate the behavior of a given iterative algorithm, like WMMSE, for solving the nonconvex optimization problem (1).
- Given a set of training data points {z(i),(i)}, the authors use a simple three layer neural network to approximate the relationship z → xT , which characterizes the behavior of GD.
- Many resource allocation algorithms other than WMMSE can be approximated following similar analysis steps as in Theorem 2, as long as they can be expressed as the composition of these ‘basic’ operations such as multiplication, division, binary search, threshloding operations, etc.
- The authors compute the resulting sum-rate of the power allocation generated by DNN and compare it with that obtained by the WMMSE.
- The authors test the robustness of the learned models by generating channels following distributions that are different from the training stage, and evaluate the resulting performance.
- The authors mention that for each network scenario (i.e., IC/IMAC with a different number of BSs/users), the authors randomly generate one million realizations of the channel as the training data and ten thousand realizations of the channel as the validation data and testing data.
- To find parameters for the training algorithm, the authors perform cross-validation for different channel models as follows: DRAFT
- The authors evaluate the sum-rate performance of the DNN-based approach in the testing stage compared to the following schemes: 1) the WMMSE; 2) the random power allocation strategy, which generates the power allocation as: pk ∼ Uniform(0, Pmax), ∀ k; 3) the maximum power allocation: pk = Pmax, ∀ k; The latter two schemes serve as heuristic baselines.
- From Table I and Table II, the authors observe that the DNN performs better in the IMAC setting, achieving lower computational time and higher relative sum-rate.
- The results are shown in TABLE IV and TABLE V, from which the authors can conclude that the model generalizes relatively well when the testing network configurations are sufficiently close to those used in the training.
- Despite the fact that the total available data set is quite limited, the proposed DNN approach can still achieve relatively high sum-rate performance on the measured testing data set.
- The authors' theoretical results indicate that it is possible to learn a well-defined optimization algorithm very well by using finite-sized deep neural networks.
- The authors' empirical results show that, for the power control problems over either the IC or the IMAC channel, deep neural networks can be trained to well-approximate the behavior of the state-of-the-art algorithm WMMSE.

- Table1: Sum-Rate and Computational Performance for Gaussian IC
- Table2: Sum-Rate and Computational Performance for IMAC (With Different Inner Circle Radius r)
- Table3: Sum-Rate and Computational Performance for IMAC
- Table4: Sum-Rate and Computational Performance for IMAC (With Different Cell Radius R)
- Table5: Sum-Rate and Computational Performance for Measured VDSL Data
- Table6: Sum-Rate and Computational Performance for Gaussian IC (Half User Case)

Funding

- Hong are supported by NSF grants CMMI-1727757, CCF-1526078, and an AFOSR grant 15RT0767

Study subjects and analysis

test samples: 10000

In this subsection, we demonstrate the scalability of the proposed DNN approach when the size of the wireless network is increased. The average achieved sum-rate performance (averaged using 10, 000 test samples) and the percentage of achieved sum-rate of DNN over that of the WMMSE are presented in TABLE I (for the IC) and TABLE II (for the IMAC). It can be seen that our proposed method achieves good scalability for prediction accuracy and computational efficiency

channel samples: 6955

We model the VDSL channel using a 28-user IC, and we use the magnitude of each channel coefficient to compute the power allocation (by using WMMSE). The entire 6955 channel samples are divided into 5000 samples of validation set and 1955 samples of testing set. The training data set is computer-generated following channel statistics learned from the validation set

training samples: 50000

Then we randomly generate 28 direct channel coefficients independently using N (md, σd2), and 28 × 27 interfering channels from N (mi, σi2) to form one training data sample. Repeating the above process, we generate 50, 000 training samples for each measured length. Using the training, validation and testing data set as described above, we perform the training and testing following the procedures outlined in Section IV

users: 28

The DNN structure used in this work. The fully connected neural network with one input layer, multiple hidden layers, and one output layer. The hidden layers use ReLU: max(·, 0) as the activation function, while the output layer use min(max(·, 0), Pmax) to incorporate the power constraint. The network configuration of IMAC with 7 BSs and 28 users. Red triangles represent the BSs, black hexagons represent the boundaries of each cell, and the colored circles represent the user locations; a) shows a case that users located uniformly in the entire cell (with r = 0); b) shows a case in which users can only locate r = 50m away from the cell center. Parameter Selection for Gaussian IC case, K = 30, where the MSEs are evaluated on the validation set. Larger batch size leads to slower convergence, while smaller batch size incurs unstable convergence behavior. Larger learning rate leads to a higher validation error, while the lower learning rate leads to slower convergence

Reference

- H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to optimize: Training deep neural networks for wireless resource management,” in Signal Processing Advances in Wireless Communications (SPAWC), 2017 IEEE 18th International Workshop on. IEEE, 2017.
- M. Hong and Z.-Q. Luo, “Signal processing and optimal resource allocation for the interference channel,” in Academic Press Library in Signal Processing. Academic Press, 2013.
- E. Bjornson and E. Jorswieck, “Optimal resource allocation in coordinated multi-cell systems,” Foundations and Trends in Communications and Information Theory, vol. 9, 2013.
- W. Yu, G. Ginis, and J. M. Cioffi, “Distributed multiuser power control for digital subscriber lines,” IEEE Journal on Selected Areas in Communications, vol. 20, no. 5, pp. 1105–1115, 2002.
- G. Scutari, D. P. Palomar, and S. Barbarossa, “Optimal linear precoding strategies for wideband noncooperative systems based on game theory – part I: Nash equilibria,” IEEE Transactions on Signal Processing, vol. 56, no. 3, pp. 1230–1249, 2008.
- D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Distributed resource allocation schemes,” IEEE Signal Processing Magazine, vol. 26, no. 5, pp. 53 –63, 2009.
- J. Papandriopoulos and J. S. Evans, “SCALE: A low-complexity distributed protocol for spectrum balancing in multiuser DSL networks,” IEEE Transactions on Information Theory, vol. 55, no. 8, pp. 3711–3724, 2009.
- Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,” IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4331–4340, 2011.
- S.-J. Kim and G. B. Giannakis, “Optimal resource allocation for MIMO Ad Hoc Cognitive Radio Networks,” IEEE Transactions on Information Theory, vol. 57, no. 5, pp. 3117 –3131, 2011.
- Z.-Q. Luo, W.-K. Ma, A.M.-C. So, Y. Ye, and S. Zhang, “Semidefinite relaxation of quadratic optimization problems,” IEEE Signal Processing Magazine, vol. 27, no. 3, pp. 20 –34, 2010.
- Y.-F Liu, Y.-H. Dai, and Z.-Q. Luo, “Joint power and admission control via linear programming deflation,” IEEE Transactions on Signal Processing, vol. 61, no. 6, pp. 1327 –1338, 2013.
- E. Matskani, N. Sidiropoulos, Z.-Q. Luo, and L. Tassiulas, “Convex approximation techniques for joint multiuser downlink beamforming and admission control,” IEEE Transactions on Wireless Communications, vol. 7, no. 7, pp. 2682 –2693, 2008.
- M. Hong, R. Sun, H. Baligh, and Z.-Q. Luo, “Joint base station clustering and beamformer design for partial coordinated transmission in heterogenous networks,” IEEE Journal on Selected Areas in Communications., vol. 31, no. 2, pp. 226–240, 2013.
- H. Baligh, M. Hong, W.-C. Liao, Z.-Q. Luo, M. Razaviyayn, M. Sanjabi, and R. Sun, “Cross-layer provision of future cellular networks: A WMMSE-based approach,” IEEE Signal Processing Magazine, vol. 31, no. 6, pp. 56–68, Nov 2014.
- W. Yu and J. M. Cioffi, “FDMA capacity of Gaussian multiple-access channel with isi,” IEEE Transactions on Communications, vol. 50, no. 1, pp. 102–111, 2002.
- Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
- K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 399–406.
- A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imgaging Science, vol. 2, no. 1, pp. 183 – 202, 2009.
- J. R. Hershey, J. Le Roux, and F. Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,” arXiv preprint arXiv:1409.2574, 2014.
- P. Sprechmann, R. Litman, T. B. Yakar, A. M. Bronstein, and G. Sapiro, “Supervised sparse analysis and synthesis operators,” in Advances in Neural Information Processing Systems, 2013, pp. 908–916.
- M. Andrychowicz, M. Denil, S. Gomez, M. Hoffman, D. Pfau, T. Schaul, and N. de Freitas, “Learning to learn by gradient descent by gradient descent,” in Advances in Neural Information Processing Systems, 2016, pp. 3981–3989.
- K. Li and J. Malik, “Learning to optimize,” arXiv preprint arXiv:1606.01885, 2016.
- T. J. O’Shea, T. C. Clancy, and R. McGwier, “Recurrent neural radio anomaly detection,” arXiv preprint arXiv:1611.00301, 2016.
- N. Farsad and A. Goldsmith, “Detection algorithms for communication systems using deep learning,” arXiv preprint arXiv:1705.08044, 2017.
- N. E. West and T. J. O’Shea, “Deep architectures for modulation recognition,” in Dynamic Spectrum Access Networks (DySPAN), 2017 IEEE International Symposium on. IEEE, 2017, pp. 1–6.
- T. J. O’Shea, T. Erpek, and T. C. Clancy, “Deep learning based MIMO communications,” arXiv preprint arXiv:1707.07980, 2017.
- N. Samuel, T. Diskin, and A. Wiesel, “Deep mimo detection,” arXiv preprint arXiv:1706.01151, 2017.
- Z-.Q. Luo and S. Zhang, “Dynamic spectrum management: Complexity and duality,” IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 1, pp. 57–73, 2008.
- S. Verdu, Multiuser Detection, Cambridge University Press, Cambridge, UK, 1998.
- D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, 2nd ed, Athena Scientific, Belmont, MA, 1997.
- K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
- S. Liang and R. Srikant, “Why deep neural networks for function approximation?,” ICLR, 2017.
- J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, “Gradient descent converges to minimizers,” 2016, Preprint, available at arXiv:1602.04915v1.
- G. Hinton, N. Srivastava, and K. Swersky, “Lecture 6a overview of mini–batch gradient descent,” Coursera Lecture slides https://class.coursera.org/neuralnets-2012-001/lecture,[Online], 2012.
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks.,” in AISTATS, 2010, vol. 9, pp. 249–256.
- W.-C. Liao, M. Hong, Y.-F. Liu, and Z.-Q. Luo, “Base station activation and linear transceiver design for optimal resource management in heterogeneous networks,” IEEE Transactions on Signal Processing, vol. 62, no. 15, pp. 3939–3952, 2014.
- E. Karipidis, N. Sidiropoulos, A. Leshem, L. Youming, R. Tarafi, and M. Ouzzif, “Crosstalk models for short VDSL2 lines from measured 30mhz data,” EURASIP Journal on applied signal processing, vol. 2006, pp. 90–90, 2006.
- E. Karipidis, N. Sidiropoulos, A. Leshem, and L. Youming, “Experimental evaluation of capacity statistics for short VDSL loops,” IEEE Transactions on Communications, vol. 53, no. 7, pp. 1119–1122, 2005.

Tags

Comments