AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our second evaluation of the sentiment engine is based on the sentence polarity dataset v1.0 that was originally released by Pang et al

Sentiment analysis in financial texts.

Decision Support Systems, no. C (2017): 53-64

Cited by: 48|Views147
EI

Abstract

The growth of financial texts in the wake of big data has challenged most organizations and brought escalating demands for analysis tools. In general, text streams are more challenging to handle than numeric data streams. Text streams are unstructured by nature, but they represent collective expressions that are of value in any financial ...More

Code:

Data:

Introduction
  • Dealing with the deluge of data being rendered through networks of people or devices has become increasingly important for business intelligence [13].
  • As a result of this trend, most of the data are increasingly unstructured, with a significant portion of the data stream in textual format
  • The forms of such data range from e-mail communications and tweets to corporate reports and daily news announcements.
  • As this stream of data continues to expand rapidly, it grows increasingly important to develop techniques for skimming through countless pages of digitized texts and picking out the useful information that is hidden in plain sight.
Highlights
  • Dealing with the deluge of data being rendered through networks of people or devices has become increasingly important for business intelligence [13]
  • We address key questions related to the explosion of interest in how to extract insight from unstructured data and how to determine if such insight provides any hints concerning the trends of financial markets
  • Our second evaluation of the sentiment engine is based on the sentence polarity dataset v1.0 that was originally released by Pang et al [39]
  • We explored whether there were any hidden patterns of sentiment that could be uncovered by comparing the results of our sentiment analysis engine (SAE) as applied to financial text streams with the data streams related to market indices
  • A parse tree resolution algorithm is proposed to resolve the possible ambiguous or incomplete structures when multiple derivations are propagated from the terminal level
  • Building on the phrase structures rendered from the parse trees, we apply a sentiment assessment heuristic to assign the polarity of phrases
Methods
  • Experiments using two Penn treebanks.
  • The proposed parsing model was trained and tested using the English and Chinese Penn Treebanks [35].
  • A treebank is a parsed text corpus that annotates syntactic sentence structures, and such treebanks are major test beds for most linguistic theories regarding sentences.
  • The development test results, including the training and test results of the chunker and phrase recognizer using the two treebanks, are shown in Table 8
Results
  • Evaluation using movie review dataset

    The authors' second evaluation of the sentiment engine is based on the sentence polarity dataset v1.0 that was originally released by Pang et al [39].
  • Evaluation using movie review dataset.
  • The authors' second evaluation of the sentiment engine is based on the sentence polarity dataset v1.0 that was originally released by Pang et al [39].
  • The data mainly contains English movie reviews that are collected from the Rotten Tomatoes website.
  • It has been used as a de facto benchmark for evaluating sentiment applications.
  • English Penn Treebank (v.
  • English Penn Treebank (v. 3.0) Chinese Penn Treebank (v. 5.0) Train. cases
Conclusion
  • The authors have provided a novel approach to developing a language parser for sentiment analysis.
  • Building on the phrase structures rendered from the parse trees, the authors apply a sentiment assessment heuristic to assign the polarity of phrases.
  • This heuristic demonstrates how the polarity of a phrase can radiate up to its parents to derive the sentence-level polar
Tables
  • Table1: Propagation of syntactic class (SC) from the POS level to the L4 level during the iterative process. All the shaded cells are in-between points since there are two consecutive POS or SC tags in the sequence. Cells marked with “+” and “/” are the merging and chunking points respectively
  • Table2: Selected POS and SC tags, with description, of the English Penn Treebank
  • Table3: Basic rationale of the chunker and phrase recognizer modules. Numbers in bold represent the steps of execution
  • Table4: Measure of association in various adjacent SC chunks, where the in-between point vn is between un and un + 1. ζ denotes the association measures using pointwise mutual information (PMI) or the likelihood ratio (LR) of the chunks
  • Table5: Chart parsing technique in resolving the optimal parse tree
  • Table6: Sentiment assessment heuristics
  • Table7: Additional rules to reckon the effects of sibling terms on the polarity of the head word w in different phrases
  • Table8: Development test results in the two treebanks
  • Table9: Parsing performance of the two parsers
  • Table10: Experimental results, in terms of accuracy, of different models in our rule-based unsupervised sentiment classification using two different word lists
  • Table11: Some basic figures on the collections of financial news used in the evaluation
  • Table12: Summary statistics, including the Hurst exponent, of the three mood time series
  • Table13: p-Values of Granger causality correlation between different mood indices and the daily differences in the closing price of the stock market. * and ** indicate the significance levels at 10% and 5%, respectively. Note that this data set includes 866,628 sentences with 12 million words that were issued during the experiment period
Download tables as Excel
Related work
  • Financial texts have become more readily available due to the proliferation of postings to the Internet and the ever-increasing demands for market transparency. These trends have given rise to financial text analysis. The idea of applying textual analysis to the financial markets is not completely new and the impact of sentiment analysis on financial markets is well established. A survey by Klein and Prestbo [28] shows how a pessimistic financial news report can affect the markets, and this study firmly supports the suggestion that news reports and markets influence each other. Ederington and Lee [17] conclude that financial texts, particularly press releases, can shed light upon intra-market volatility. Engle and Ng [19] suggest the notion of news impact curve, which provides a device to explain market returns using news. Wuthrich et al [58] analyze news articles from five popular financial websites and develop an online computational linguistics system for predicting stock prices. Melvin and Yin [36] also suggest that readers usually pay more attention to the financial news headlines on the fly. The impacts of the headlines on financial returns cannot be ignored. Poon and Granger [41] describe how a combination of stocks and options can be used to predict volatilities. They conclude that the best and most elaborate quantitative models fail to rival predictions based on implied volatilities. These authors note that the question of whether forecasting can be enhanced by using exogenous variables such as news reports is potentially important for future research. These ideas are extended by Chan [11], who study the profitability of different types of portfolios. Portfolios with stocks featured in news releases outperform others over the same period, and these featured stocks have significantly high momentum returns. The exogenous information supplied by the news reports improves stock returns. Loughran and McDonald [32] suggest that the percentages of uncertain, weak modal and negative words are powerful variables in explaining levels of underpricing in most initial public offerings (IPOs) in stock markets. Supported by computational linguistics, Antweiler and Frank [2] trace more than 1.5 million messages from Yahoo! Finance, and find that stock messages help predict market volatility. Tetlock [50] shows that the number of negative words (as defined by the Harvard IV4 Dictionary) in the “Abreast of the Market” column in the Wall Street Journal can help to predict a company's cash flow. The presence of pessimism in this column predicts negative returns (reversals) the next day, but this predictability disappears within a week. Stock prices usually under-react to the underlying negative information supplied by such news articles, and it takes roughly one day for negative news to affect the market. Baker and Wurgler [4] also present evidence that investor sentiment has significant effects on stock prices. Kothari et al [29] conclude that adverse news about a firm is linked with its stock price volatility. Rather than using the traditional Harvard Dictionary to uncover negative information in texts, Loughran and McDonald [31] develop an alternative negative word list that better reflects the tone of financial texts. They also count on a common-term-weighing scheme to reduce the noise introduced by misclassification in financial texts. Shiller [47] argues that the news media play an important role in market movements. Investors tend to follow printed words, even though most financial writing is pure hype. It is one of the main reasons in creating the asset bubbles. Garcia [21] revisits the suggestions made by Shiller [47] by studying financial market news from the New York Times over the 1905 to 2005 period, and concludes that the link between media content and stock market returns is indeed concentrated in times of hardship.
Funding
  • The work described in this paper was partially supported by the grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project Nos
Reference
  • D.E. Allen, M.J. McAleer, A.K. Singh, Machine news and volatility: the Dow Jones industrial average and the TRNA real-time high-frequency sentiment series, in: G.N. Gregoriou (Ed.), The Handbook of High Frequency Trading, Academic Press, 2015, pp. 327–344.
    Google ScholarLocate open access versionFindings
  • W. Antweiler, M.Z. Frank, Is all that talk just noise? The information content of internet stock message boards, J. Financ. 59 (2004) 1259–1294.
    Google ScholarLocate open access versionFindings
  • S. Baccianella, A. Esuli, F. Sebastiani, in: N. Calzolari, K. Choukri (Eds.), SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining, European Language Resources Association, LREC, 2010.
    Google ScholarFindings
  • M. Baker, J. Wurgler, Investor sentiment and the cross-section of stock returns, J. Financ. 61 (2006) 1645–1680.
    Google ScholarLocate open access versionFindings
  • D.M. Bikel, On the Parameter Space of Generative Lexicalized Statistical Parsing Models (PhD thesis), University of Pennsylvania, 2004.
    Google ScholarFindings
  • S. Billot, B. Lang, The structure of shared forests in ambiguous parsing, In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1989, pp. 143–151.
    Google ScholarLocate open access versionFindings
  • J. Bollen, H. Mao, X. Zeng, Twitter mood as a stock market predictor, Comput. Oper. Res. 44 (2011) 91–94.
    Google ScholarLocate open access versionFindings
  • M. Cecchini, H. Aytug, G.J. Koehler, P. Pathak, Making words work: using financial text as a predictor of financial events, Decis. Support. Syst. 50 (2010) 164–175.
    Google ScholarLocate open access versionFindings
  • S.W.K. Chan, M.W.C. Chong, Recursive part-of-speech tagging using word structures, In: Lecture Notes in Artificial Intelligence, vol. 8082, Springer-Verlag, 2013, pp. 419–425.
    Google ScholarLocate open access versionFindings
  • S.W.K. Chan, M.W.C. Chong, L.Y.L. Cheung, An analysis of tree topological features in classifier-based unlexicalized parsing, In: Lecture Notes in Computer Science, vol. 6608, Springer-Verlag, 2011, pp. 155–170.
    Google ScholarLocate open access versionFindings
  • W.S. Chan, Stock price reaction to news and no-news: drift and reversal after headlines, J. Financ. Econ. 70 (2003) 223–260.
    Google ScholarLocate open access versionFindings
  • C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011), 27.
    Google ScholarLocate open access versionFindings
  • H. Chen, R.H.L. Chiang, V.C. Storey, Business intelligence and analytics: from big data to big impact, MIS Q. 36 (4) (2012) 1165–1188.
    Google ScholarFindings
  • K.-T. Chen, H.-M. Lu, T.-J. Chen, S.-H. Li, J.-S. Lian, H. Chen, Giving context to accounting numbers: the role of news coverage, Decis. Support. Syst. 50 (2011) 673–679.
    Google ScholarLocate open access versionFindings
  • M. Collins, Head-driven Statistical Models for Natural Language Parsing (Ph.D. thesis), University of Pennsylvania, Philadelphia, 1999.
    Google ScholarFindings
  • M. Couillard, M. Davison, A comment on measuring the Hurst exponent of financial time series, Phys. A 348 (2005) 404–418.
    Google ScholarLocate open access versionFindings
  • L.H. Ederington, J.H. Lee, How markets process information: news releases and volatility, J. Financ. 48 (1993) 1161–1191.
    Google ScholarLocate open access versionFindings
  • M. Eickhoff, J. Muntermann, Stock analysts vs. the crowd: mutual prediction and the drivers of crowd wisdom, Inf. Manag. (2016).
    Google ScholarFindings
  • R.F. Engle, V.K. Ng, Measuring and testing the impact of news on volatility, J. Financ. 48 (1993) 1749–1778.
    Google ScholarLocate open access versionFindings
  • E. Fersini, E. Messina, F.A. Pozzi, Sentiment analysis: Bayesian ensemble learning, Decis. Support. Syst. 68 (2014) 26–38.
    Google ScholarLocate open access versionFindings
  • D. Garcia, Sentiment during recessions, J. Financ. 68 (3) (2013) 1267–1300.
    Google ScholarLocate open access versionFindings
  • C.W.J. Granger, Investigating causal relations by econometric models and cross-spectral methods, Econometrica 37 (3) (1969) 424–438.
    Google ScholarFindings
  • S.S. Groth, J. Muntermann, An intraday market risk management approach based on textual analysis, Decis. Support. Syst. 50 (2011) 680–691.
    Google ScholarLocate open access versionFindings
  • M. Hu, B. Liu, Mining and summarizing customer reviews, In: Proceedings of the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Seattle, 2004.
    Google ScholarLocate open access versionFindings
  • C.-L. Huang, C.K. Chung, N. Hui, Y.-C. Lin, Y.-T. Seih, W.-C. Chen, B. Lam, M. Bond, J.W. Pennebaker, The development of the Chinese linguistic inquiry and word count dictionary, Chin. J. Psychol. 54 (2) (2012) 185–201 (in Chinese).
    Google ScholarLocate open access versionFindings
  • H.E. Hurst, Long-term storage of reservoirs: an experimental study, Trans. Am. Soc. Civ. Eng. 116 (1951) 770–799.
    Google ScholarLocate open access versionFindings
  • T. Joachims, Learning to Classify Text Using Support Vector Machines, Kluwer, 2002.
    Google ScholarFindings
  • F. Klein, J.A. Prestbo, News and the Markets, Henry Regnery, Chicago, 1974.
    Google ScholarFindings
  • S. Kothari, X. Li, J. Short, The effect of disclosures by management, analysts, and business press on cost of capital, return volatility, and analyst forecasts: a study using content analysis, Account. Rev. 84 (2009) 1639–1670.
    Google ScholarLocate open access versionFindings
  • B. Liu, Sentiment Analysis and Opinion Mining, Morgan & Claypool, 2012.
    Google ScholarFindings
  • T. Loughran, B. McDonald, When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks, J. Financ. 66 (2011) 35–65.
    Google ScholarLocate open access versionFindings
  • T. Loughran, B. McDonald, IPO first-day returns, offer price revisions, volatility, and form S-1 language, J. Financ. Econ. 109 (2013) 307–326.
    Google ScholarLocate open access versionFindings
  • Y. Lu, C. Zhai, N. Sundaresan, Rated aspect summarization of short comments, In: Proceedings of 18th International World Wide Web Conference (WWW’09), Madrid, Spain, 2009.
    Google ScholarLocate open access versionFindings
  • B.B. Mandelbrot, The (Mis)Behavior of Markets: A Fractal View of Risk, Basic Books, Ruin and Reward, 2004.
    Google ScholarFindings
  • M. Marcus, M. Santorini, M. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank, Comput. Linguist. 19 (2) (1993) 313–330.
    Google ScholarLocate open access versionFindings
  • M. Melvin, X. Yin, Public information arrival, exchange rate volatility and quote frequency, Econ. J. 110 (2000) 644–661.
    Google ScholarLocate open access versionFindings
  • G. Mitra, L. Mitra, The Handbook of News Analytics in Finance, John Wiley, West Sussex, 2011.
    Google ScholarFindings
  • M. Palmer, F.-D. Chiou, N. Xue, T.-K. Lee, Chinese Treebank 5.0 LDC2005T01. Web Download, Linguistic Data Consortium, Philadelphia, 2005.
    Google ScholarFindings
  • B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques, In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), 2002.
    Google ScholarLocate open access versionFindings
  • D.A. Pierce, Relationships and lack thereof between economic time series, with special reference to money and interest rates, J. Am. Stat. Assoc. 72 (March 1977) 11–22. Applications Sections.
    Google ScholarLocate open access versionFindings
  • S.-H. Poon, C.W.J. Granger, Practical issues in forecasting volatility, Financ. Anal. J. 61 (1) (2005) 45–56.
    Google ScholarLocate open access versionFindings
  • R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1994.
    Google ScholarLocate open access versionFindings
  • A. Rosenbach, Animacy versus weight as determinants of grammatical variation in English, Language 81 (3) (2005) 613–644.
    Google ScholarLocate open access versionFindings
  • R.P. Schumaker, H. Chen, Textual analysis of stock market prediction using breaking financial news: the AZFin text system, ACM Trans. Inf. Syst. 27 (2009) 2, 12.
    Google ScholarLocate open access versionFindings
  • R.P. Schumaker, Y. Zhang, C. Huang, H. Chen, Evaluating sentiment in financial news articles, Decis. Support. Syst. 53 (3) (2012) 458–464.
    Google ScholarLocate open access versionFindings
  • M.A.M. Shaikh, H. Prendinger, I. Mitsurs, Assessing sentiment of text by semantic dependency and contextual valence analysis, In: Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII-07), 2007, pp. 191–202.
    Google ScholarLocate open access versionFindings
  • R.J. Shiller, Irrational Exuberance, Princeton University Press, 2000.
    Google ScholarFindings
  • H. Sun, D. Jurafsky, Shallow semantic parsing of Chinese, In: Proceedings of NAACL-HLT, 2004.
    Google ScholarLocate open access versionFindings
  • J. Surowiecki, The Wisdom of Crowds, Anchor, 2005.
    Google ScholarFindings
  • P.C. Tetlock, Giving content to investor sentiment: the role of media in the stock market, J. Financ. 62 (2007) 1139–1168.
    Google ScholarLocate open access versionFindings
  • L. Todorovski, S. Džeroski, Combining classifiers with meta decision tress, Mach. Learn. J. 50 (3) (2003) 223–249.
    Google ScholarLocate open access versionFindings
  • P.D. Turney, Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-2002), 2002.
    Google ScholarLocate open access versionFindings
  • G. Wang, J. Sun, J. Ma, K. Xu, J. Gu, Sentiment classification: the contribution of ensemble learning, Decis. Support. Syst. 57 (2014) 77–93.
    Google ScholarLocate open access versionFindings
  • T. Wasow, Remarks on grammatical weight, Lang. Var. Chang. 9 (1997) 81–105.
    Google ScholarLocate open access versionFindings
  • W. Wei, J.A. Gulla, Sentiment learning on product reviews via sentiment ontology tree, In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, pp. 404–413.
    Google ScholarLocate open access versionFindings
  • J. Wiebe, R. Mihalcea, Word sense and subjectivity, In: Proceedings of ACL-06, 2006, pp. 1065–1072.
    Google ScholarLocate open access versionFindings
  • T. Wilson, J. Wiebe, P. Hoffmann, Recognizing contextual polarity in phrase-level sentiment analysis, In: Proceedings of HLT/EMNLP, 2005, pp. 347–354.
    Google ScholarLocate open access versionFindings
  • B. Wuthrich, D. Permunetilleke, S. Leung, W. Lam, V. Cho, J. Zhang, Daily predication of major stock indices from textual WWW data, HKIE Trans. 5 (3) (1998) 151–156.
    Google ScholarFindings
  • A. Zellner, Comments on time series and causal concepts in business cycle research, in: C.A. Sims (Ed.), New Methods in Business Cycle Research, Federal Reserve Bank of Minneapolis, 1977.
    Google ScholarLocate open access versionFindings
  • B. Ženko, L. Todorovski, S. Džeroski, A comparison of stacking with meta decision trees to other combining methods, In: Proceedings of the Fourth International Multi-Conference Information Society, Jozef Stefan Institute, Ljubljana, 2001, pp. 144–147. vol. A..
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科