# HC-Search: Learning Heuristics and Cost Functions for Structured Prediction

AAAI, 2013.

EI

Keywords:

Weibo:

Abstract:

Structured prediction is the problem of learning a function from structured inputs to structured outputs. Inspired by the recent successes of search-based structured prediction, we introduce a new framework for structured prediction called HC-Search. Given a structured input, the framework uses a search procedure guided by a learned heuri...More

Code:

Data:

Introduction

- The authors consider the problem of structured prediction, where the predictor must produce a structured output given a structured input.
- A standard approach to structured prediction is to learn a cost function C(x, y) for scoring a potential structured output y given a structured input x.
- Given such a cost function and a new input x, the output computation involves solving the so-called “Argmin” problem, which is to find the minimum cost output for a given input.
- The learning algorithms generally assume exact inference and their behavior in the context of heuristic inference is not well understood

Highlights

- We consider the problem of structured prediction, where the predictor must produce a structured output given a structured input
- We show that in practice HC-Search performs significantly better than the single cost function search and other state-of-the-art approaches to structured prediction
- We evaluate our approach on the following four structured prediction problems: 1) Handwriting Recognition (HW)
- We introduced the HC-Search framework for structured prediction whose principal feature is the separation of the cost function from search heuristic
- We showed that our framework yields significantly superior performance to state-ofthe-art results, and allows an informative error analysis and diagnostics
- Our investigation showed that the main source of error of existing output-space approaches including our own approach (HC-Search) is the inability of cost function to correctly rank the candidate outputs produced by the heuristic

Results

- Noting that there is room to improve the generation loss, the above results do not indicate whether improving this loss via a better learned heuristic would lead to better results overall
- To help evaluate this the authors ran an experiment where the authors gave HC-Search the true loss function to use as a heuristic, i.e., H(x, y) = L(x, y, y∗), during both training of the cost function and testing.
- The authors note that the authors have experimented with more sophisticated imitation learning algorithms (e.g., Dagger (Ross, Gordon, and Bagnell 2011)), but did not see significant improvements

Conclusion

**Conclusions and Future**

Work

The authors introduced the HC-Search framework for structured prediction whose principal feature is the separation of the cost function from search heuristic.- The authors' investigation showed that the main source of error of existing output-space approaches including the own approach (HC-Search) is the inability of cost function to correctly rank the candidate outputs produced by the heuristic
- This analysis suggests that learning more powerful cost functions, e.g., Regression trees (Mohan, Chen, and Weinberger 2011), with an eye towards anytime performance (Grubb and Bagnell 2012; Xu, Weinberger, and Chapelle 2012) would be productive.
- Another direction to pursue is heuristic function learning to speed up the process of generating high-quality outputs (Fern 2010)

Summary

## Introduction:

The authors consider the problem of structured prediction, where the predictor must produce a structured output given a structured input.- A standard approach to structured prediction is to learn a cost function C(x, y) for scoring a potential structured output y given a structured input x.
- Given such a cost function and a new input x, the output computation involves solving the so-called “Argmin” problem, which is to find the minimum cost output for a given input.
- The learning algorithms generally assume exact inference and their behavior in the context of heuristic inference is not well understood
## Results:

Noting that there is room to improve the generation loss, the above results do not indicate whether improving this loss via a better learned heuristic would lead to better results overall- To help evaluate this the authors ran an experiment where the authors gave HC-Search the true loss function to use as a heuristic, i.e., H(x, y) = L(x, y, y∗), during both training of the cost function and testing.
- The authors note that the authors have experimented with more sophisticated imitation learning algorithms (e.g., Dagger (Ross, Gordon, and Bagnell 2011)), but did not see significant improvements
## Conclusion:

**Conclusions and Future**

Work

The authors introduced the HC-Search framework for structured prediction whose principal feature is the separation of the cost function from search heuristic.- The authors' investigation showed that the main source of error of existing output-space approaches including the own approach (HC-Search) is the inability of cost function to correctly rank the candidate outputs produced by the heuristic
- This analysis suggests that learning more powerful cost functions, e.g., Regression trees (Mohan, Chen, and Weinberger 2011), with an eye towards anytime performance (Grubb and Bagnell 2012; Xu, Weinberger, and Chapelle 2012) would be productive.
- Another direction to pursue is heuristic function learning to speed up the process of generating high-quality outputs (Fern 2010)

- Table1: Error rates of different structured prediction algorithms
- Table2: HC-Search vs. C-Search: Error decomposition of heuristic and cost function

Funding

- This work was supported in part by NSF grants IIS 1219258, IIS 1018490 and in part by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under Contract No FA8750-13-2-0033

Reference

- Agarwal, S., and Roth, D. 2005. Learnability of bipartite ranking functions. In COLT, 16–31.
- Collins, M. 2000. Discriminative reranking for natural language parsing. In ICML, 175–182.
- Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; and Singer, Y. 2006. Online passive-aggressive algorithms. JMLR 7:551–585.
- Dietterich, T. G.; Hild, H.; and Bakiri, G. 1995. A comparison of ID3 and backpropagation for english text-to-speech mapping. MLJ 18(1):51–80.
- Doppa, J. R.; Fern, A.; and Tadepalli, P. 2012. Output space search for structured prediction. In ICML.
- Felzenszwalb, P. F., and McAllester, D. A. 2007. The generalized A* architecture. JAIR 29:153–190.
- Fern, A.; Yoon, S. W.; and Givan, R. 2006. Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. JAIR 25:75–118.
- Fern, A. 2010. Speedup learning. In Encyclopedia of Machine Learning. 907–911.
- Grubb, A., and Bagnell, D. 2012. Speedboost: Anytime prediction with uniform near-optimality. JMLR Proceedings Track 22:458–466.
- Hal Daume III; Langford, J.; and Marcu, D. 2009. Searchbased structured prediction. MLJ 75(3):297–325.
- Hoffgen, K.-U.; Simon, H.-U.; and Horn, K. S. V. 1995. Robust trainability of single neurons. Journal of Computer and System Sciences 50(1):114–125.
- Jiang, J.; Teichert, A.; Daume III, H.; and Eisner, J. 20Learned prioritization for trading off accuracy and speed. In NIPS, 1340–1348.
- Khardon, R. 1999. Learning to take actions. Machine Learning 35(1):57–90.
- Lafferty, J.; McCallum, A.; and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 441–448.
- Mohan, A.; Chen, Z.; and Weinberger, K. Q. 2011. Websearch ranking with initialized gradient boosted regression trees. JMLR Proceedings Track 14:77–89.
- Ross, S., and Bagnell, D. 2010. Efficient reductions for imitation learning. In AISTATS, 661–668.
- Ross, S.; Gordon, G.; and Bagnell, D. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, 627–635.
- Taskar, B.; Guestrin, C.; and Koller, D. 2003. Max-margin markov networks. In NIPS.
- Tsochantaridis, I.; Hofmann, T.; Joachims, T.; and Altun, Y. 2004. Support vector machine learning for interdependent and structured output spaces. In ICML.
- Vogel, J., and Schiele, B. 2007. Semantic modeling of natural scenes for content-based image retrieval. IJCV 72(2):133–157.
- Weiss, D., and Taskar, B. 2010. Structured prediction cascades. In AISTATS, 916–923.
- Weiss, D.; Sapp, B.; and Taskar, B. 2010. Sidestepping intractable inference with structured ensemble cascades. In NIPS, 2415–2423.
- Wick, M. L.; Rohanimanesh, K.; Singh, S.; and McCallum, A. 2009. Training factor graphs with reinforcement learning for efficient map inference. In NIPS, 2044–2052.
- Wick, M. L.; Rohanimanesh, K.; Bellare, K.; Culotta, A.; and McCallum, A. 2011. Samplerank: Training factor graphs with atomic gradients. In ICML, 777–784.
- Xu, Y.; Fern, A.; and Yoon, S. 2009. Learning linear ranking functions for beam search with application to planning. JMLR 10:1571–1610.
- Xu, Z.; Weinberger, K.; and Chapelle, O. 2012. The greedy miser: Learning under test-time budgets. In ICML.

Full Text

Best Paper

Best Paper of AAAI, 2013

Tags

Comments