Collaborative Multi-Agent Multi-Armed Bandit Learning for Small-Cell Caching

IEEE Transactions on Wireless Communications, pp. 1-1, 2020.

Cited by: 0|Bibtex|Views25|DOI:https://doi.org/10.1109/twc.2020.2966599
Other Links: academic.microsoft.com|arxiv.org
Weibo:
In the non-stationary environment, we modified the multi-agent multi-armed bandit-based algorithms proposed in the stationary environment by proposing a practical initialization method and designing new perturbed terms to adapt to the dynamic environment better

Abstract:

This paper investigates learning-based caching in small-cell networks (SCNs) when user preference is unknown. The goal is to optimize the cache placement in each small base station (SBS) for minimizing the system long-term transmission delay. We model this sequential multi-agent decision making problem in a multi-agent multi-armed bandi...More

Code:

Data:

0
Introduction
  • The proliferation of mobile devices results in a steep growth of mobile data traffic, which imposes heavy pressure on backhaul links with limited capacity in cellular networks.
  • Due to the fact that only a small number of files accounts for the majority of the data traffic, caching popular files at small base stations (SBSs) has been widely adopted to reduce the traffic congestion and alleviate the backhaul pressure [2], [3].
  • Caching design without the knowledge of file popularity or user preference is a necessary but challenging issue.
  • The aim of this work is to address the above issue by investigating the caching design when user preference is unknown
Highlights
  • In recent decades, the proliferation of mobile devices results in a steep growth of mobile data traffic, which imposes heavy pressure on backhaul links with limited capacity in cellular networks
  • We model this sequential multi-agent decision making problem in a multi-agent multi-armed bandit perspective and solve it by learning cache strategy directly online in both stationary and non-stationary environment
  • We proposed multiple efficient multi-agent multi-armed bandit algorithms in both distributed and collaborative manners to solve the problem
  • In the agent-based collaborative multi-agent multi-armed bandit, we provided a strong performance guarantee by proving that its regret is bounded by O(log (Ttotal))
  • To achieve a better balance between small base station coordination and computational complexity, we proposed a coordination graph edge-based reward assignment method in the edge-based collaborative multi-agent multi-armed bandit
  • In the non-stationary environment, we modified the multi-agent multi-armed bandit-based algorithms proposed in the stationary environment by proposing a practical initialization method and designing new perturbed terms to adapt to the dynamic environment better
Results
  • The authors demonstrate the performance of the proposed algorithms in both stationary environment and nonstationary environment.
  • Simulations are performed in a square area of 100 × 100 m2.
  • Both SBSs and users are uniformly distributed in this plane.
  • The authors assume that users request for files according to independent Zipf distributions.
  • The simulation parameters are set as follows: the number of files
Conclusion
  • The authors investigated the collaborative caching optimization problem to minimize the accumulated transmission delay without the knowledge of user preference.
  • The authors model this sequential multi-agent decision making problem in a MAMAB perspective and solve it by learning cache strategy directly online in both stationary and non-stationary environment.
  • The effects of the communication distance and cache size were discussed
Summary
  • Introduction:

    The proliferation of mobile devices results in a steep growth of mobile data traffic, which imposes heavy pressure on backhaul links with limited capacity in cellular networks.
  • Due to the fact that only a small number of files accounts for the majority of the data traffic, caching popular files at small base stations (SBSs) has been widely adopted to reduce the traffic congestion and alleviate the backhaul pressure [2], [3].
  • Caching design without the knowledge of file popularity or user preference is a necessary but challenging issue.
  • The aim of this work is to address the above issue by investigating the caching design when user preference is unknown
  • Results:

    The authors demonstrate the performance of the proposed algorithms in both stationary environment and nonstationary environment.
  • Simulations are performed in a square area of 100 × 100 m2.
  • Both SBSs and users are uniformly distributed in this plane.
  • The authors assume that users request for files according to independent Zipf distributions.
  • The simulation parameters are set as follows: the number of files
  • Conclusion:

    The authors investigated the collaborative caching optimization problem to minimize the accumulated transmission delay without the knowledge of user preference.
  • The authors model this sequential multi-agent decision making problem in a MAMAB perspective and solve it by learning cache strategy directly online in both stationary and non-stationary environment.
  • The effects of the communication distance and cache size were discussed
Tables
  • Table1: Notation Table
Download tables as Excel
Funding
  • This work is supported in part by the NSF of China under grant 61941106 and by the National Key R&D Project of China under grant 2019YFB1802702
Reference
  • [25] S. O. Somuyiwa, D. Gndz, and A. Gyorgy, “Reinforcement learning for proactive caching of contents with different demand probabilities,” in
    Google ScholarLocate open access versionFindings
  • [1] X. Xu and M. Tao, “Collaborative multi-agent reinforcement learning of caching optimization in small-cell networks,” in IEEE Proc. Global Commun. Conf., Dec. 2018.
    Google ScholarLocate open access versionFindings
  • [2] X. Wang, M. Chen, T. Taleb, A. Ksentini, and V. C. Leung, “Cache in the air: exploiting content caching and delivery techniques for 5g systems,” IEEE Commun. Mag., vol. 52, no. 2, pp. 131–139, Feb. 2014.
    Google ScholarLocate open access versionFindings
  • [3] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of proactive caching in 5g wireless networks,” IEEE Commun. Mag., vol. 52, no. 8, pp. 82–89, Aug. 2014.
    Google ScholarLocate open access versionFindings
  • [4] X. Peng, J.-C. Shen, J. Zhang, and K. B. Letaief, “Backhaul-aware caching placement for wireless networks,” in IEEE Proc.Global Commun. Conf., Dec. 2015, pp. 1–6.
    Google ScholarLocate open access versionFindings
  • [5] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 8402– 8413, Dec. 2013.
    Google ScholarLocate open access versionFindings
  • [6] E. Bastug, M. Bennis, M. Kountouris, and M. Debbah, “Cache-enabled small cell networks: Modeling and tradeoffs,” EURASIP J. on Wireless Commun. and Netw., vol. 2015, no. 1, pp. 1–11, Feb. 2015.
    Google ScholarLocate open access versionFindings
  • [7] B. Blaszczyszyn and A. Giovanidis, “Optimal geographic caching in cellular networks,” in IEEE Proc. Int. Conf. Commun., Jun. 2015, pp. 3358–3363. Proc. Int. Symp. Wireless Commun. Syst, Aug 2018, pp. 1–6.
    Google ScholarLocate open access versionFindings
  • [26] K. Guo, C. Yang, and T. Liu, “Caching in base station with recommendation via Q-learning,” in IEEE Proc. Wireless Commun. Netw. Conf., Mar. 2017, pp. 1–6.
    Google ScholarLocate open access versionFindings
  • [27] C. Zhong, M. C. Gursoy, and S. Velipasalar, “A deep reinforcement learning-based framework for content caching,” in Proc. Annual Conf. Inf. Sci. Syst. (CISS), Mar. 2018, pp. 1–6.
    Google ScholarLocate open access versionFindings
  • [28] C. Guestrin, D. Koller, and R. Parr, “Multiagent planning with factored MDPs,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2002, pp. 1523– 1530.
    Google ScholarLocate open access versionFindings
  • [29] J. R. Kok and N. Vlassis, “Collaborative multiagent reinforcement learning by payoff propagation,” J. Mach. Learn. Res., vol. 7, pp. 1789– 1828, Sep. 2006.
    Google ScholarLocate open access versionFindings
  • [30] W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed bandit: General framework and applications,” in International Conference on Machine Learning, 2013, pp. 151–159.
    Google ScholarLocate open access versionFindings
  • [31] N. Vlassis, R. Elhorst, and J. R. Kok, “Anytime algorithms for multiagent decision making using coordination graphs,” in Proc. Int. Conf. Syst., Man Cybern. (ICSMC), vol. 1, Oct. 2004, pp. 953–957.
    Google ScholarLocate open access versionFindings
  • [32] C. Claus and C. Boutilier, “The dynamics of reinforcement learning in cooperative multiagent systems,” AAAI/IAAI, vol. 1998, no. 746-752, p. 2, 1998.
    Google ScholarLocate open access versionFindings
  • [33] D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S.
    Google ScholarFindings
  • [8] S. H. Chae and W. Choi, “Caching placement in stochastic wireless used and least frequently used policies,” IEEE Trans. Comput., vol. 50, Trans. Wireless Commun., vol. 15, no. 10, pp. 6626–6637, Oct. 2016.
    Google ScholarLocate open access versionFindings
  • no. 12, pp. 1352–1361, Dec 2001.
    Google ScholarFindings
  • [9] Y. Chen, M. Ding, J. Li, Z. Lin, G. Mao, and L. Hanzo, “Probabilistic
    Google ScholarFindings
  • [34] F. M. Harper and J. A. Konstan, “The movielens datasets: History and small-cell caching: Performance analysis and optimization,” IEEE Trans. context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, pp. 1–19, Dec.
    Google ScholarLocate open access versionFindings
  • Veh. Technol., vol. 66, no. 5, pp. 4341–4354, May. 2017.
    Google ScholarLocate open access versionFindings
  • [10] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 2856–2867, May. 2014.
    Google ScholarLocate open access versionFindings
  • [11] X. Xu and M. Tao, “Modeling, analysis, and optimization of coded caching in small-cell networks,” IEEE Trans. Commun., vol. 65, no. 8, pp. 3415–3428, Aug. 2017.
    Google ScholarLocate open access versionFindings
  • [12] A. Tang, S. Roy, and X. Wang, “Coded caching for wireless backhaul networks with unequal link rates,” IEEE Trans. Commun., vol. 66, no. 1, pp. 1–13, Jan. 2018.
    Google ScholarLocate open access versionFindings
  • [13] M. Leconte, G. Paschos, L. Gkatzikis, M. Draief, S. Vassilaras, and S. Chouvardas, “Placing dynamic content in caches with small population,” in IEEE Proc. Int. Conf. Comput. Commun., Apr. 2016, pp. 1–9.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments