A feature set for streams and an application to high-frequency financial tick data

Proceedings of the 2014 International Conference on Big Data Science and Computing, (2014)

被引用0|浏览33
EI
下载 PDF 全文
引用
微博一下

摘要

We propose a set of features to study the effects of data streams on complex systems. This feature set is called the the signature representation of a stream. It has its origin in pure mathematics and relies on a relationship between non-commutative polynomials and paths. This representation had already signifcant impact on algebraic topo...更多

代码

数据

0
简介
  • The last decade has seen an enormous rise in the need for analysing and mining data streams.
  • H.2.8 [Database applications]: Data mining; G.3 [Probability strategy, etc
  • In all these examples it is clear that the information in the stream that the authors have to extract to understand to provide features for a stream that reliably capture such “non-commutative order information” in a quantitative way but this is exactly the goal of this paper.
重点内容
  • The last decade has seen an enormous rise in the need for analysing and mining data streams
  • We believe that the methods we have presented in the previous section can provide a robust way to study phenomena that appear in HFT data streams
  • We introduced a new set of features to describe a data stream and its effects
  • These features are the sequence of iterated integrals of paths that connect piecewise linear the sample points in the data stream
  • The numerical results indicate that the signature can be a powerful nonparametric method; in the regression context of our example it outperformed standard explanatory variables
结果
  • LEARNING A TRADING STRATEGY

    The family of trading strategies the authors have choosen is a classic but non-trivial family of strategies: the so-called constant proportion of wealth strategy, see [15].
  • At each new tick the investor rebalances the allocation of her wealth into the risky and riskless asset by following the principle that she always aims to have a constant proportion of her current overall wealth invested into the risky asset
  • Such trading strategies are self-financing, only requires a non-zero initial investment and despite the simple investment rule it gives rise to many interesting questions [15, 4, 18].
  • That is the data set consists of tuplesi=1,...,N where each pi is a stream of 400 subsequent stock prices and ri denotes the return of this investment
结论
  • The authors introduced a new set of features to describe a data stream and its effects.
  • Classic results from pure mathematics guarantee a one-to-one corresponds of this graded sequence of finite dimensional tensors and the underlying path.
  • The authors applied this feature to describe a real-world data stream and its effects by OLS-regression against the truncated signature.
表格
  • Table1: Three subsequent time ticks from the (reduced) electronic order book for IBM shares traded at the NYSE on 9th April 2014. The change of periods of slow trading to periods with lots of activity
  • Table2: OLS regression against different sets of explanatory variables
Download tables as Excel
基金
  • Note especially that the complexity of the regression against the signature (15) and the regression against the price increments (14) is the same (an intercept plus six explanatory variables: either six price increments or the first six elements of the signature); however regression against the signature significantly outperforms (15) regression against increments (14): the standard deviation on the testing set for the signature is less than a third(!) of that of the increments; on the learning set R2 of the signature regression is more than 12% bigger than that of the increments, see Table 2
引用论文
  • Agrachev, A. A. Introduction to optimal control theory. In Mathematical control theory, Part 1, 2 (Trieste, 2001), ICTP Lect. Notes, VIII. Abdus Salam Int. Cent. Theoret. Phys., Trieste, 2002, pp. 453–513 (electronic).
    Google ScholarLocate open access versionFindings
  • Chen, K.-T. Integration of paths, geometric invariants and a generalized Baker-Hausdorff formula. Ann. of Math. (2) 65 (1957), 163–178.
    Google ScholarLocate open access versionFindings
  • Chen, K.-T. Integration of paths—a faithful representation of paths by non-commutative formal power series. Trans. Amer. Math. Soc. 89 (1958), 395–407.
    Google ScholarLocate open access versionFindings
  • Cover, T. M. Universal portfolios. Mathematical finance 1, 1 (1991), 1–29.
    Google ScholarFindings
  • Delbaen, F., and Schachermayer, W. A general version of the fundamental theorem of asset pricing. Math. Ann. 300, 3 (1994), 463–520.
    Google ScholarLocate open access versionFindings
  • Flint, G., Hambly, B., and Lyons, T. Discretely sampled signals and the rough Hoff process. ArXiv e-prints (Oct. 2013).
    Google ScholarFindings
  • Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. Mining data streams: a review. ACM Sigmod Record 34, 2 (2005), 18–26.
    Google ScholarLocate open access versionFindings
  • Gergely Gyurko, L., Lyons, T., Kontkowski, M., and Field, J. Extracting information from the signature of a financial data stream. ArXiv e-prints (July 2013).
    Google ScholarFindings
  • [10] Hambly, B., and Lyons, T. Uniqueness for the signature of a path of bounded variation and the reduced path group. Ann. of Math. (2) 171, 1 (2010), 109–167.
    Google ScholarLocate open access versionFindings
  • [11] Levin, D., Lyons, T., and Ni, H. Learning from the past, predicting the statistics for the future, learning an evolving system. ArXiv e-prints (Sept. 2013).
    Google ScholarFindings
  • [13] Lyons, T. J., Caruana, M., and Levy, T. Differential equations driven by rough paths, 2007. Lectures from the 34th Summer School on Probability Theory held in Saint-Flour, July 6–24, 2004, With an introduction concerning the Summer School by Jean Picard.
    Google ScholarLocate open access versionFindings
  • [15] Perold, A. F., and Sharpe, W. F. Dynamic strategies for asset allocation. Financial Analysts Journal (1988), 16–27.
    Google ScholarLocate open access versionFindings
  • [16] Rajaraman, A., and Ullman, J. D. Mining of massive datasets. Cambridge University Press, 2012.
    Google ScholarFindings
  • [17] Reutenauer, C. Free Lie algebras. The Clarendon Press Oxford University Press, New York, 1993. Oxford Science Publications.
    Google ScholarFindings
  • [18] Rotando, L. M., and Thorp, E. O. The kelly criterion and the stock market. American Mathematical Monthly 99 (1992), 922–922.
    Google ScholarLocate open access versionFindings
作者
Harald Oberhauser
Harald Oberhauser
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科