A feature set for streams and an application to high-frequency financial tick data
Proceedings of the 2014 International Conference on Big Data Science and Computing, (2014)
We propose a set of features to study the effects of data streams on complex systems. This feature set is called the the signature representation of a stream. It has its origin in pure mathematics and relies on a relationship between non-commutative polynomials and paths. This representation had already signifcant impact on algebraic topo...更多
下载 PDF 全文
- The last decade has seen an enormous rise in the need for analysing and mining data streams.
- H.2.8 [Database applications]: Data mining; G.3 [Probability strategy, etc
- In all these examples it is clear that the information in the stream that the authors have to extract to understand to provide features for a stream that reliably capture such “non-commutative order information” in a quantitative way but this is exactly the goal of this paper.
- The last decade has seen an enormous rise in the need for analysing and mining data streams
- We believe that the methods we have presented in the previous section can provide a robust way to study phenomena that appear in HFT data streams
- We introduced a new set of features to describe a data stream and its effects
- These features are the sequence of iterated integrals of paths that connect piecewise linear the sample points in the data stream
- The numerical results indicate that the signature can be a powerful nonparametric method; in the regression context of our example it outperformed standard explanatory variables
- LEARNING A TRADING STRATEGY
The family of trading strategies the authors have choosen is a classic but non-trivial family of strategies: the so-called constant proportion of wealth strategy, see .
- At each new tick the investor rebalances the allocation of her wealth into the risky and riskless asset by following the principle that she always aims to have a constant proportion of her current overall wealth invested into the risky asset
- Such trading strategies are self-financing, only requires a non-zero initial investment and despite the simple investment rule it gives rise to many interesting questions [15, 4, 18].
- That is the data set consists of tuplesi=1,...,N where each pi is a stream of 400 subsequent stock prices and ri denotes the return of this investment
- The authors introduced a new set of features to describe a data stream and its effects.
- Classic results from pure mathematics guarantee a one-to-one corresponds of this graded sequence of finite dimensional tensors and the underlying path.
- The authors applied this feature to describe a real-world data stream and its effects by OLS-regression against the truncated signature.
- Table1: Three subsequent time ticks from the (reduced) electronic order book for IBM shares traded at the NYSE on 9th April 2014. The change of periods of slow trading to periods with lots of activity
- Table2: OLS regression against different sets of explanatory variables
- Note especially that the complexity of the regression against the signature (15) and the regression against the price increments (14) is the same (an intercept plus six explanatory variables: either six price increments or the first six elements of the signature); however regression against the signature significantly outperforms (15) regression against increments (14): the standard deviation on the testing set for the signature is less than a third(!) of that of the increments; on the learning set R2 of the signature regression is more than 12% bigger than that of the increments, see Table 2
- Agrachev, A. A. Introduction to optimal control theory. In Mathematical control theory, Part 1, 2 (Trieste, 2001), ICTP Lect. Notes, VIII. Abdus Salam Int. Cent. Theoret. Phys., Trieste, 2002, pp. 453–513 (electronic).
- Chen, K.-T. Integration of paths, geometric invariants and a generalized Baker-Hausdorff formula. Ann. of Math. (2) 65 (1957), 163–178.
- Chen, K.-T. Integration of paths—a faithful representation of paths by non-commutative formal power series. Trans. Amer. Math. Soc. 89 (1958), 395–407.
- Cover, T. M. Universal portfolios. Mathematical finance 1, 1 (1991), 1–29.
- Delbaen, F., and Schachermayer, W. A general version of the fundamental theorem of asset pricing. Math. Ann. 300, 3 (1994), 463–520.
- Flint, G., Hambly, B., and Lyons, T. Discretely sampled signals and the rough Hoff process. ArXiv e-prints (Oct. 2013).
- Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. Mining data streams: a review. ACM Sigmod Record 34, 2 (2005), 18–26.
- Gergely Gyurko, L., Lyons, T., Kontkowski, M., and Field, J. Extracting information from the signature of a financial data stream. ArXiv e-prints (July 2013).
-  Hambly, B., and Lyons, T. Uniqueness for the signature of a path of bounded variation and the reduced path group. Ann. of Math. (2) 171, 1 (2010), 109–167.
-  Levin, D., Lyons, T., and Ni, H. Learning from the past, predicting the statistics for the future, learning an evolving system. ArXiv e-prints (Sept. 2013).
-  Lyons, T. J., Caruana, M., and Levy, T. Differential equations driven by rough paths, 2007. Lectures from the 34th Summer School on Probability Theory held in Saint-Flour, July 6–24, 2004, With an introduction concerning the Summer School by Jean Picard.
-  Perold, A. F., and Sharpe, W. F. Dynamic strategies for asset allocation. Financial Analysts Journal (1988), 16–27.
-  Rajaraman, A., and Ullman, J. D. Mining of massive datasets. Cambridge University Press, 2012.
-  Reutenauer, C. Free Lie algebras. The Clarendon Press Oxford University Press, New York, 1993. Oxford Science Publications.
-  Rotando, L. M., and Thorp, E. O. The kelly criterion and the stock market. American Mathematical Monthly 99 (1992), 922–922.