AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Experimental results show that our model can outperform baselines, including some recently proposed deep matching algorithms

Text Matching as Image Recognition.

THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, (2016): 2793-2799

被引用339|浏览151
EI
下载 PDF 全文
引用
微博一下

摘要

Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patter...更多

代码

数据

0
简介
  • Matching two texts is central to many natural language applications, such as machine translation (Brown et al 1993), question and answering (Xue, Jeon, and Croft 2008), paraphrase identification (Socher et al 2011) and document retrieval (Li and Xu 2014).
  • Given two texts T1 = (w1, w2, .
  • Vn), the degree of matching is typically measured as a score produced by a scoring function on the representation of each text: match(T1, T2) = F Φ(T1), Φ(T2) , (1).
  • Φ is a function to map each text to a vector, and F is the scoring function for modeling the interactions between them.
  • Taking the task of paraphrase identification for example, given the following two texts: T1 : Down the ages noodles and dumplings were famous Chinese food.
重点内容
  • Matching two texts is central to many natural language applications, such as machine translation (Brown et al 1993), question and answering (Xue, Jeon, and Croft 2008), paraphrase identification (Socher et al 2011) and document retrieval (Li and Xu 2014)
  • Given two texts T1 = (w1, w2, . . . , wm) and T2 = (v1, v2, . . . , vn), the degree of matching is typically measured as a score produced by a scoring function on the representation of each text: match(T1, T2) = F Φ(T1), Φ(T2), (1)
  • Φ is a function to map each text to a vector, and F is the scoring function for modeling the interactions between them
  • We propose to view text matching as image recognition and use CNN to solve the above problem
  • Experimental results show that our model can outperform baselines, including some recently proposed deep matching algorithms
方法
  • ALLPOSITIVE: All of the test data are predicted as positive.
  • TF-IDF: TF-IDF (Salton, Fox, and Wu 1983) is a widely used method in text mining.
  • In this method, each text is represented as a |V |-dimensional vector with each element stands for the TF-IDF score of the corresponding word in the text, where |V | is the vocabulary size.
  • DSSM/CDSSM: Since DSSM (Huang et al 2013) and CDSSM (Gao et al 2014; Shen et al 2014) need large data for training, the authors directly use the released models2 on the test data
结论
  • The authors view text matching as image recognition, and propose a new deep architecture, namely MatchPyramid.
  • The authors' model can automatically capture important matching patterns such as unigram, n-gram and n-term at different levels.
  • Experimental results show that the model can outperform baselines, including some recently proposed deep matching algorithms
总结
  • Introduction:

    Matching two texts is central to many natural language applications, such as machine translation (Brown et al 1993), question and answering (Xue, Jeon, and Croft 2008), paraphrase identification (Socher et al 2011) and document retrieval (Li and Xu 2014).
  • Given two texts T1 = (w1, w2, .
  • Vn), the degree of matching is typically measured as a score produced by a scoring function on the representation of each text: match(T1, T2) = F Φ(T1), Φ(T2) , (1).
  • Φ is a function to map each text to a vector, and F is the scoring function for modeling the interactions between them.
  • Taking the task of paraphrase identification for example, given the following two texts: T1 : Down the ages noodles and dumplings were famous Chinese food.
  • Methods:

    ALLPOSITIVE: All of the test data are predicted as positive.
  • TF-IDF: TF-IDF (Salton, Fox, and Wu 1983) is a widely used method in text mining.
  • In this method, each text is represented as a |V |-dimensional vector with each element stands for the TF-IDF score of the corresponding word in the text, where |V | is the vocabulary size.
  • DSSM/CDSSM: Since DSSM (Huang et al 2013) and CDSSM (Gao et al 2014; Shen et al 2014) need large data for training, the authors directly use the released models2 on the test data
  • Conclusion:

    The authors view text matching as image recognition, and propose a new deep architecture, namely MatchPyramid.
  • The authors' model can automatically capture important matching patterns such as unigram, n-gram and n-term at different levels.
  • Experimental results show that the model can outperform baselines, including some recently proposed deep matching algorithms
表格
  • Table1: Results on MSRP
  • Table2: Results on the task of paper citation matching
  • Table3: The norm of learned word embeddings on the task of paper citation matching
Download tables as Excel
相关工作
  • Most previous work on text matching tries to find good representations for a single text, and usually use a simple scoring function to obtain the matching results. Examples include Partial Least Square (Wu, Li, and Xu 2013), Canonical Correlation Analysis (Hardoon and Shawe-Taylor 2003)

    and some deep models such as DSSM (Huang et al 2013), CDSSM (Gao et al 2014; Shen et al 2014) and ARC-I (Hu et al 2014).

    Recently, a brand new approach focusing on modeling the interaction between two sentences has been proposed and gained much attention, examples include DEEPMATCH (Lu and Li 2013), URAE (Socher et al 2011) and ARC-II (Hu et al 2014). Our model falls into this category, thus we give some detailed discussions on the differences of our model against these methods.

    DEEPMATCH uses topic model to construct the interactions between two texts, and then make different levels of abstractions by a hierarchical architecture based on the relationships between topics. Compared with our matching matrix defined at word level, DEEPMATCH uses topic information with more rough granularity. Moreover, it relies largely on the quality of learned topic model, and the hierarchies are usually ambiguous since the relationships between topics are not absolute. On the contrary, MatchPyramid clearly models the interactions at different levels, from words, phrases to sentences.
基金
  • This work was funded by 973 Program of China under Grants No 2014CB340401 and 2012CB316303, 863 Program of China under Grants No 2014AA015204, the National Natural Science Foundation of China (NSFC) under Grants No 61472401, 61433014, 61425016, 61425016, and 61203298, Key Research Program of the Chinese Academy of Sciences under Grant No KGZD-EW-T03-2, and Youth Innovation Promotion Association CAS under Grants No 20144310
引用论文
  • Brown, P. F.; Pietra, V. J. D.; Pietra, S. A. D.; and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2):263–311.
    Google ScholarLocate open access versionFindings
  • Dahl, G. E.; Sainath, T. N.; and Hinton, G. E. 2013. Improving deep neural networks for lvcsr using rectified linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 8609–8613. IEEE.
    Google ScholarFindings
  • Dolan, W. B., and Brockett, C. 2005. Automatically constructing a corpus of sentential paraphrases. In Proc. of IWP.
    Google ScholarLocate open access versionFindings
  • Duchi, J.; Hazan, E.; and Singer, Y. 2011. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research 12:2121–2159.
    Google ScholarLocate open access versionFindings
  • Gao, J.; Pantel, P.; Gamon, M.; He, X.; Deng, L.; and Shen, Y. 2014. Modeling interestingness with deep neural networks. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Giles, R. C. S. L. L. 2001. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, volume 13, 402.
    Google ScholarLocate open access versionFindings
  • Girshick, R.; Donahue, J.; Darrell, T.; and Malik, J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 580–58IEEE.
    Google ScholarLocate open access versionFindings
  • Hardoon, D. R., and Shawe-Taylor, J. 2003. Kcca for different level precision in content-based image retrieval. In Proceedings of Third International Workshop on Content-Based Multimedia Indexing, IRISA, Rennes, France.
    Google ScholarLocate open access versionFindings
  • Hinton, G. E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2012. Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580.
    Findings
  • Hu, B.; Lu, Z.; Li, H.; and Chen, Q. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in Neural Information Processing Systems, 2042–2050.
    Google ScholarLocate open access versionFindings
  • Huang, P.-S.; He, X.; Gao, J.; Deng, L.; Acero, A.; and Heck, L. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on Information and Knowledge Management, 2333–2338. ACM.
    Google ScholarLocate open access versionFindings
  • Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
    Findings
  • Kalchbrenner, N.; Grefenstette, E.; and Blunsom, P. 2014. A convolutional neural network for modelling sentences. CoRR abs/1404.2188.
    Findings
  • LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324.
    Google ScholarLocate open access versionFindings
  • Li, H., and Xu, J. 2014. Semantic matching in search. Foundations and Trends in Information Retrieval 7(5):343–469.
    Google ScholarLocate open access versionFindings
  • Lu, Z., and Li, H. 2013. A deep architecture for matching short texts. In Advances in Neural Information Processing Systems, 1367–1375.
    Google ScholarLocate open access versionFindings
  • Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781.
    Findings
  • Salton, G.; Fox, E. A.; and Wu, H. 1983. Extended boolean information retrieval. Communications of the ACM 26(11):1022–1036.
    Google ScholarLocate open access versionFindings
  • Shen, Y.; He, X.; Gao, J.; Deng, L.; and Mesnil, G. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 101–110. ACM.
    Google ScholarLocate open access versionFindings
  • Simard, P. Y.; Steinkraus, D.; and Platt, J. C. 2003. Best practices for convolutional neural networks applied to visual document analysis. In 2013 12th International Conference on Document Analysis and Recognition, volume 2, 958–958. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • Socher, R.; Huang, E. H.; Pennin, J.; Manning, C. D.; and Ng, A. Y. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems, 801–809.
    Google ScholarLocate open access versionFindings
  • Williams, D. R. G. H. R., and Hinton, G. 1986. Learning representations by back-propagating errors. Nature 323–533.
    Google ScholarLocate open access versionFindings
  • Wu, W.; Li, H.; and Xu, J. 2013. Learning query and document similarities from click-through bipartite graph with metadata. In Proceedings of the sixth ACM international conference on WSDM, 687–696. ACM.
    Google ScholarLocate open access versionFindings
  • Xue, X.; Jeon, J.; and Croft, W. B. 2008. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 475–482. ACM.
    Google ScholarLocate open access versionFindings
  • Zeiler, M. D., and Fergus, R. 2014. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014. Springer. 818–833.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
小科