Commission Fee is not Enough: A Hierarchical Reinforced Framework for Portfolio Management
Weibo:
Abstract:
Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error. Existing methods are impractical since they usually assume each reallocation can be finished immediately and thus ignoring the p...More
Code:
Data:
Introduction
- The problem of portfolio management is widely studied in the area of algorithmic trading.
- Many existing RL methods get promising results by focusing on various technologies to extract richer representation, e.g., by modelbased learning (Tang 2018; Yu et al 2019), by adversarial learning (Liang et al 2018), or by state augmentation (Ye et al 2020)
- These RL algorithms assume that portfolio weights can change immediately at the last price once an order is placed.
- Due to the need of balancing the long-term profit maximization and short-term trade execution, it is challenge for a single/flat RL algorithm to operate on different levels of temporal tasks
Highlights
- The problem of portfolio management is widely studied in the area of algorithmic trading
- From the plots of Dow Jones Industrial Average Index (DJIA) index, we can see this period is in a bull market generally, the market edges down several times
- Our strategy could gain more profit under the same risk. When it goes to Maximum DrawDown (MDD) and Downside Deviation Ratio (DDR), the results show that HRPM bears the least risk, even lower than Uniform Constant Rebalanced Portfolios (UCRP)
- We focus on the problem of portfolio management with trading cost via deep reinforcement learning
- We propose a hierarchical reinforced stock trading system (HRPM)
- Extensive experimental results in the U.S market and China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches
Results
- The authors' HRPM is the only strategy that outperforms the DJIA index, when DPM keeps almost the same as the trend of the market.
- On ASR, most of methods get higher scores than DJIA index and HRPM is the best.
- When the portfolio values of other methods decline, the strategy still hovers at the peak
- This phenomenon demonstrates that HRPM is superior to all the baselines, it is robust under different market conditions relatively
Conclusion
- The authors focus on the problem of portfolio management with trading cost via deep reinforcement learning.
- The authors propose a hierarchical reinforced stock trading system (HRPM).
- The authors build a hierarchy of portfolio management over trade execution and train the corresponding policies.
- The high-level policy gives portfolio weights and invokes the low-level policy to sell or buy the corresponding shares within a short time window.
- Extensive experimental results in the U.S market and China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches
Summary
Introduction:
The problem of portfolio management is widely studied in the area of algorithmic trading.- Many existing RL methods get promising results by focusing on various technologies to extract richer representation, e.g., by modelbased learning (Tang 2018; Yu et al 2019), by adversarial learning (Liang et al 2018), or by state augmentation (Ye et al 2020)
- These RL algorithms assume that portfolio weights can change immediately at the last price once an order is placed.
- Due to the need of balancing the long-term profit maximization and short-term trade execution, it is challenge for a single/flat RL algorithm to operate on different levels of temporal tasks
Objectives:
The authors' objective is to maximize the final portfolio value given a long time horizon by taking into account the trading cost.- In order to encourage the high-level policy not to “put all the eggs in one basket”, the authors aim to find a highlevel policy that maximizes the maximum entropy objective:
Results:
The authors' HRPM is the only strategy that outperforms the DJIA index, when DPM keeps almost the same as the trend of the market.- On ASR, most of methods get higher scores than DJIA index and HRPM is the best.
- When the portfolio values of other methods decline, the strategy still hovers at the peak
- This phenomenon demonstrates that HRPM is superior to all the baselines, it is robust under different market conditions relatively
Conclusion:
The authors focus on the problem of portfolio management with trading cost via deep reinforcement learning.- The authors propose a hierarchical reinforced stock trading system (HRPM).
- The authors build a hierarchy of portfolio management over trade execution and train the corresponding policies.
- The high-level policy gives portfolio weights and invokes the low-level policy to sell or buy the corresponding shares within a short time window.
- Extensive experimental results in the U.S market and China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches
Tables
- Table1: Period of stock data used in the experiments
- Table2: Performance comparison in the U.S market
- Table3: Performance comparison in the China market
- Table4: Ablation on the effect of entropy in the U.S market
Funding
- This research is supported, in part, by the Joint NTUWeBank Research Centre on Fintech (Award No: NWJ2019-008), Nanyang Technological University, Singapore
Reference
- Almahdi, S.; and Yang, S. Y. 2017. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications 87: 267–279.
- Borodin, A.; El-Yaniv, R.; and Gogan, V. 2004. Can we learn to beat the best stock. In Advances in Neural Information Processing Systems, 345–352.
- Cover, T. M. 2011. Universal portfolios. In The Kelly Capital Growth Investment Criterion: Theory and Practice, 181– 209. World Scientific.
- Gaivoronski, A. A.; and Stella, F. 2000. Stochastic nonstationary optimization for finding universal portfolios. Annals of Operations Research 100(1-4): 165–188.
- Gao, L.; and Zhang, W. 2013. Weighted moving average passive aggressive algorithm for online portfolio selection. In 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, volume 1, 327– 330. IEEE.
- Jiang, Z.; and Liang, J. 2017. Cryptocurrency portfolio management with deep reinforcement learning. In 2017 Intelligent Systems Conference (IntelliSys), 905–913. IEEE.
- Jiang, Z.; Xu, D.; and Liang, J. 201A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
- Li, B.; and Hoi, S. C. 2012. On-line portfolio selection with moving average reversion. In Proceedings of the 29th International Coference on International Conference on Machine Learning, 563–570.
- Liang, Z.; Chen, H.; Zhu, J.; Jiang, K.; and Li, Y. 2018. Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940.
- Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; and Wierstra, D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; and Riedmiller, M. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
- Mosavi, A.; Ghamisi, P.; Faghan, Y.; and Duan, P. 2020. Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics. arXiv preprint arXiv:2004.01509.
- Nevmyvaka, Y.; Feng, Y.; and Kearns, M. 2006. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd International Conference on Machine Learning, 673–680.
- Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529(7587): 484.
- Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 1057–1063.
- Tang, L. 2018. An actor-critic-based portfolio investment method inspired by benefit-risk optimization. Journal of Algorithms & Computational Technology 12(4): 351–360.
- Tavakoli, A.; Pardo, F.; and Kormushev, P. 2018. Action branching architectures for deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 4131–4138.
- Ye, Y.; Pei, H.; Wang, B.; Chen, P.-Y.; Zhu, Y.; Xiao, J.; and Li, B. 2020. Reinforcement-learning based portfolio management with augmented asset movement prediction states. arXiv preprint arXiv:2002.05780.
- Yu, P.; Lee, J. S.; Kulyatin, I.; Shi, Z.; and Dasgupta, S. 20Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv preprint arXiv:1901.08740.
- Zhang, J.; and Tao, D. 20Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things. IEEE Internet of Things Journal doi:10.1109/JIOT.2020.3039359.
Full Text
Tags
Comments