HC-Search: Learning Heuristics and Cost Functions for Structured Prediction

AAAI, 2013.

Cited by: 20|Bibtex|Views97
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We introduced the HC-Search framework for structured prediction whose principal feature is the separation of the cost function from search heuristic

Abstract:

Structured prediction is the problem of learning a function from structured inputs to structured outputs. Inspired by the recent successes of search-based structured prediction, we introduce a new framework for structured prediction called HC-Search. Given a structured input, the framework uses a search procedure guided by a learned heuri...More

Code:

Data:

0
Introduction
  • The authors consider the problem of structured prediction, where the predictor must produce a structured output given a structured input.
  • A standard approach to structured prediction is to learn a cost function C(x, y) for scoring a potential structured output y given a structured input x.
  • Given such a cost function and a new input x, the output computation involves solving the so-called “Argmin” problem, which is to find the minimum cost output for a given input.
  • The learning algorithms generally assume exact inference and their behavior in the context of heuristic inference is not well understood
Highlights
  • We consider the problem of structured prediction, where the predictor must produce a structured output given a structured input
  • We show that in practice HC-Search performs significantly better than the single cost function search and other state-of-the-art approaches to structured prediction
  • We evaluate our approach on the following four structured prediction problems: 1) Handwriting Recognition (HW)
  • We introduced the HC-Search framework for structured prediction whose principal feature is the separation of the cost function from search heuristic
  • We showed that our framework yields significantly superior performance to state-ofthe-art results, and allows an informative error analysis and diagnostics
  • Our investigation showed that the main source of error of existing output-space approaches including our own approach (HC-Search) is the inability of cost function to correctly rank the candidate outputs produced by the heuristic
Results
  • Noting that there is room to improve the generation loss, the above results do not indicate whether improving this loss via a better learned heuristic would lead to better results overall
  • To help evaluate this the authors ran an experiment where the authors gave HC-Search the true loss function to use as a heuristic, i.e., H(x, y) = L(x, y, y∗), during both training of the cost function and testing.
  • The authors note that the authors have experimented with more sophisticated imitation learning algorithms (e.g., Dagger (Ross, Gordon, and Bagnell 2011)), but did not see significant improvements
Conclusion
  • Conclusions and Future

    Work

    The authors introduced the HC-Search framework for structured prediction whose principal feature is the separation of the cost function from search heuristic.
  • The authors' investigation showed that the main source of error of existing output-space approaches including the own approach (HC-Search) is the inability of cost function to correctly rank the candidate outputs produced by the heuristic
  • This analysis suggests that learning more powerful cost functions, e.g., Regression trees (Mohan, Chen, and Weinberger 2011), with an eye towards anytime performance (Grubb and Bagnell 2012; Xu, Weinberger, and Chapelle 2012) would be productive.
  • Another direction to pursue is heuristic function learning to speed up the process of generating high-quality outputs (Fern 2010)
Summary
  • Introduction:

    The authors consider the problem of structured prediction, where the predictor must produce a structured output given a structured input.
  • A standard approach to structured prediction is to learn a cost function C(x, y) for scoring a potential structured output y given a structured input x.
  • Given such a cost function and a new input x, the output computation involves solving the so-called “Argmin” problem, which is to find the minimum cost output for a given input.
  • The learning algorithms generally assume exact inference and their behavior in the context of heuristic inference is not well understood
  • Results:

    Noting that there is room to improve the generation loss, the above results do not indicate whether improving this loss via a better learned heuristic would lead to better results overall
  • To help evaluate this the authors ran an experiment where the authors gave HC-Search the true loss function to use as a heuristic, i.e., H(x, y) = L(x, y, y∗), during both training of the cost function and testing.
  • The authors note that the authors have experimented with more sophisticated imitation learning algorithms (e.g., Dagger (Ross, Gordon, and Bagnell 2011)), but did not see significant improvements
  • Conclusion:

    Conclusions and Future

    Work

    The authors introduced the HC-Search framework for structured prediction whose principal feature is the separation of the cost function from search heuristic.
  • The authors' investigation showed that the main source of error of existing output-space approaches including the own approach (HC-Search) is the inability of cost function to correctly rank the candidate outputs produced by the heuristic
  • This analysis suggests that learning more powerful cost functions, e.g., Regression trees (Mohan, Chen, and Weinberger 2011), with an eye towards anytime performance (Grubb and Bagnell 2012; Xu, Weinberger, and Chapelle 2012) would be productive.
  • Another direction to pursue is heuristic function learning to speed up the process of generating high-quality outputs (Fern 2010)
Tables
  • Table1: Error rates of different structured prediction algorithms
  • Table2: HC-Search vs. C-Search: Error decomposition of heuristic and cost function
Download tables as Excel
Funding
  • This work was supported in part by NSF grants IIS 1219258, IIS 1018490 and in part by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under Contract No FA8750-13-2-0033
Reference
  • Agarwal, S., and Roth, D. 2005. Learnability of bipartite ranking functions. In COLT, 16–31.
    Google ScholarLocate open access versionFindings
  • Collins, M. 2000. Discriminative reranking for natural language parsing. In ICML, 175–182.
    Google ScholarLocate open access versionFindings
  • Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; and Singer, Y. 2006. Online passive-aggressive algorithms. JMLR 7:551–585.
    Google ScholarLocate open access versionFindings
  • Dietterich, T. G.; Hild, H.; and Bakiri, G. 1995. A comparison of ID3 and backpropagation for english text-to-speech mapping. MLJ 18(1):51–80.
    Google ScholarLocate open access versionFindings
  • Doppa, J. R.; Fern, A.; and Tadepalli, P. 2012. Output space search for structured prediction. In ICML.
    Google ScholarFindings
  • Felzenszwalb, P. F., and McAllester, D. A. 2007. The generalized A* architecture. JAIR 29:153–190.
    Google ScholarLocate open access versionFindings
  • Fern, A.; Yoon, S. W.; and Givan, R. 2006. Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. JAIR 25:75–118.
    Google ScholarLocate open access versionFindings
  • Fern, A. 2010. Speedup learning. In Encyclopedia of Machine Learning. 907–911.
    Google ScholarLocate open access versionFindings
  • Grubb, A., and Bagnell, D. 2012. Speedboost: Anytime prediction with uniform near-optimality. JMLR Proceedings Track 22:458–466.
    Google ScholarLocate open access versionFindings
  • Hal Daume III; Langford, J.; and Marcu, D. 2009. Searchbased structured prediction. MLJ 75(3):297–325.
    Google ScholarLocate open access versionFindings
  • Hoffgen, K.-U.; Simon, H.-U.; and Horn, K. S. V. 1995. Robust trainability of single neurons. Journal of Computer and System Sciences 50(1):114–125.
    Google ScholarLocate open access versionFindings
  • Jiang, J.; Teichert, A.; Daume III, H.; and Eisner, J. 20Learned prioritization for trading off accuracy and speed. In NIPS, 1340–1348.
    Google ScholarLocate open access versionFindings
  • Khardon, R. 1999. Learning to take actions. Machine Learning 35(1):57–90.
    Google ScholarLocate open access versionFindings
  • Lafferty, J.; McCallum, A.; and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 441–448.
    Google ScholarLocate open access versionFindings
  • Mohan, A.; Chen, Z.; and Weinberger, K. Q. 2011. Websearch ranking with initialized gradient boosted regression trees. JMLR Proceedings Track 14:77–89.
    Google ScholarLocate open access versionFindings
  • Ross, S., and Bagnell, D. 2010. Efficient reductions for imitation learning. In AISTATS, 661–668.
    Google ScholarLocate open access versionFindings
  • Ross, S.; Gordon, G.; and Bagnell, D. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, 627–635.
    Google ScholarLocate open access versionFindings
  • Taskar, B.; Guestrin, C.; and Koller, D. 2003. Max-margin markov networks. In NIPS.
    Google ScholarFindings
  • Tsochantaridis, I.; Hofmann, T.; Joachims, T.; and Altun, Y. 2004. Support vector machine learning for interdependent and structured output spaces. In ICML.
    Google ScholarFindings
  • Vogel, J., and Schiele, B. 2007. Semantic modeling of natural scenes for content-based image retrieval. IJCV 72(2):133–157.
    Google ScholarLocate open access versionFindings
  • Weiss, D., and Taskar, B. 2010. Structured prediction cascades. In AISTATS, 916–923.
    Google ScholarLocate open access versionFindings
  • Weiss, D.; Sapp, B.; and Taskar, B. 2010. Sidestepping intractable inference with structured ensemble cascades. In NIPS, 2415–2423.
    Google ScholarLocate open access versionFindings
  • Wick, M. L.; Rohanimanesh, K.; Singh, S.; and McCallum, A. 2009. Training factor graphs with reinforcement learning for efficient map inference. In NIPS, 2044–2052.
    Google ScholarFindings
  • Wick, M. L.; Rohanimanesh, K.; Bellare, K.; Culotta, A.; and McCallum, A. 2011. Samplerank: Training factor graphs with atomic gradients. In ICML, 777–784.
    Google ScholarLocate open access versionFindings
  • Xu, Y.; Fern, A.; and Yoon, S. 2009. Learning linear ranking functions for beam search with application to planning. JMLR 10:1571–1610.
    Google ScholarLocate open access versionFindings
  • Xu, Z.; Weinberger, K.; and Chapelle, O. 2012. The greedy miser: Learning under test-time budgets. In ICML.
    Google ScholarFindings
Full Text
Your rating :
0

 

Best Paper
Best Paper of AAAI, 2013
Tags
Comments