Semi-supervised learning of semantic classes for query understanding: from the web and for the web

CIKM, pp. 37-46, 2009.

Cited by: 27|Bibtex|Views13|DOI:https://doi.org/10.1145/1645953.1645961
EI
Other Links: dblp.uni-trier.de|dl.acm.org|academic.microsoft.com
Weibo:
By increasing coverage while maintaining low level of confusability, similar precision and recall levels can be observed on both training and test data, which effectively avoids model over-fitting. We study these effects in the context of query tagging, we believe that our discov...

Abstract:

Understanding intents from search queries can improve a user's search experience and boost a site's advertising profits. Query tagging via statistical sequential labeling models has been shown to perform well, but annotating the training set for supervised learning requires substantial human effort. Domain-specific knowledge, such as sema...More

Code:

Data:

Introduction
  • One can greatly improve a user’s search experience and boost a site’s advertising profits by better understanding the user’s intent.
  • By understanding what users are looking for, one can provide the exact information they need
  • Much of this information may be found in the contents of relational databases, which are indexed by most search engines.
  • An improved understanding of query intent enables better decisions in the selection of contextual ads.
  • Relevant local ads may be selected when query tagging identifies a city in the following query: hotels in white swan washington
Highlights
  • One can greatly improve a user’s search experience and boost a site’s advertising profits by better understanding the user’s intent
  • By increasing coverage while maintaining low level of confusability, similar precision and recall levels can be observed on both training and test data, which effectively avoids model over-fitting. We study these effects in the context of query tagging, we believe that our discoveries generalize beyond our task and provide a guideline for future research on semisupervised knowledge acquisition for information extraction and named entity recognition
  • One algorithm resulted in significant improvements in query tagging accuracy, and substantially reduced the human effort needed to manually label training data
  • By comparing the behavior of two algorithms, we found that the precision-centric learning algorithms are not suitable for use in sequential labeling tasks, due to the problem of over-fitting
  • We note that each of the two algorithms discussed was not designed for the purpose of lexicon acquisition for query tagging
  • Experimental results on retail product queries show that enhancing a query tagger with lexicons learned with this objective reduces word level tagging errors by up to 25% compared to the baseline tagger that does not use any lexicon features
  • While the present work compares existing algorithms adapted to our task of query tagging, we are planning to develop novel algorithms based on our insights in the future
Methods
  • This section compares the effectiveness of the two semisupervised lexicon learning algorithms for the purpose of query tagging.
  • The authors have conducted experiments with a data set of product search queries logged by a commercial search engine, which was manually labeled by annotators.
  • The authors compared tagging accuracy on the test set for three different conditions: using CRFs without lexicon features, using CRFs with lexicon features obtained by Algorithm I, and using CRFs with lexicon features obtained by Algorithm II.
  • For some product categories including Computing and Electronics (C&E) and Clothing and Shoes (C&S), structured databases are available, from which one can directly
Results
  • The experiment results were obtained by excluding the seed phrases that do not exist in the bipartite graph from the learned lexicons.

    In the first experiment, the authors examine the contributions of lexicons at different strata to the tagging accuracy.
  • Figure 2 shows the test set word level query tagging accuracy as different numbers of lexicon strata are included in the model for each semantic class.
  • As more strata of lexicons are used, the CRF with stratified lexicons learned by Algorithm II achieves improved word level tagging accuracy.
  • While the number of lexicon strata has little impact on tagging accuracy after the seventh stratum, it does make a practical difference with respect to running time and memory
Conclusion
  • The authors applied two different semi-supervised graph learning algorithms to acquire semantic class lexicons from Web lists, and used the lexicons as features in CRFs for query tagging.
  • It is better to over-generalize the learned lexicons to result in a similar recall on the training and the test set, while maintaining a low level of confusion among the semantic classes of interest.
  • This can be achieved by simultaneously learning lexicons of multiple competing classes via distribution propagation.
  • The authors can quantify “confusability” and include it in an objective function, such that new learning algorithms can be designed to directly optimize the objective function
Summary
  • Introduction:

    One can greatly improve a user’s search experience and boost a site’s advertising profits by better understanding the user’s intent.
  • By understanding what users are looking for, one can provide the exact information they need
  • Much of this information may be found in the contents of relational databases, which are indexed by most search engines.
  • An improved understanding of query intent enables better decisions in the selection of contextual ads.
  • Relevant local ads may be selected when query tagging identifies a city in the following query: hotels in white swan washington
  • Methods:

    This section compares the effectiveness of the two semisupervised lexicon learning algorithms for the purpose of query tagging.
  • The authors have conducted experiments with a data set of product search queries logged by a commercial search engine, which was manually labeled by annotators.
  • The authors compared tagging accuracy on the test set for three different conditions: using CRFs without lexicon features, using CRFs with lexicon features obtained by Algorithm I, and using CRFs with lexicon features obtained by Algorithm II.
  • For some product categories including Computing and Electronics (C&E) and Clothing and Shoes (C&S), structured databases are available, from which one can directly
  • Results:

    The experiment results were obtained by excluding the seed phrases that do not exist in the bipartite graph from the learned lexicons.

    In the first experiment, the authors examine the contributions of lexicons at different strata to the tagging accuracy.
  • Figure 2 shows the test set word level query tagging accuracy as different numbers of lexicon strata are included in the model for each semantic class.
  • As more strata of lexicons are used, the CRF with stratified lexicons learned by Algorithm II achieves improved word level tagging accuracy.
  • While the number of lexicon strata has little impact on tagging accuracy after the seventh stratum, it does make a practical difference with respect to running time and memory
  • Conclusion:

    The authors applied two different semi-supervised graph learning algorithms to acquire semantic class lexicons from Web lists, and used the lexicons as features in CRFs for query tagging.
  • It is better to over-generalize the learned lexicons to result in a similar recall on the training and the test set, while maintaining a low level of confusion among the semantic classes of interest.
  • This can be achieved by simultaneously learning lexicons of multiple competing classes via distribution propagation.
  • The authors can quantify “confusability” and include it in an objective function, such that new learning algorithms can be designed to directly optimize the objective function
Tables
  • Table1: Semantic classes (CRF labels) in product query tagging task
  • Table2: Examples of seed distributions
  • Table3: The size of the training and test data
  • Table4: Number of lexical phrases for different semantic classes for the C&E and C&S categories
  • Table5: The performance of each semantic class Semantic classes Precision Recall F1
  • Table6: Test set semantic class instance coverage
  • Table7: Table 7
  • Table8: Table 7: Stratum comparison of the ambiguous lexicons that cover a test set instance phrase. The lexicons of the correct semantic classes have higher ranks for 70% or more of test set instance occurrence than those of competing semantic classes. Comparison of the average absolute values of the weights between the lexicon features and other features
Download tables as Excel
Related work
  • Semantic class and relation acquisition is a well-studied topic. Much research leverages linguistic patterns to extract semantic classes and relations from free text [6, 2, 11, 14, 7]. In [4], an algorithm is introduced to learn semantic classes and named entity extraction patterns simultaneously. Recently there has been increasing interest in leveraging structured data from the Web to learn semantic classes and relations. In [20], a Web page wrapper induction algorithm is presented that learns language-independent patterns for semantic class lexicon expansion. In [5], both linguistic patterns and wrappers for lists are used to extract semantic class members. In [3], a Web-scale relational database is built by filtering HTML tables with statistical classifiers. The work described in [19] is closely related to ours. Like us, the authors use graph learning to acquire open-domain semantic classes by leveraging structured Web data, in their case, the HTML tables reported in [3]. Another closely related work is described in [18], where a context pattern induction algorithm is used to obtain lexicons, which in turn are used by a named entity recognition model.
Reference
  • Textgraphs: Graph-based algorithms for natural language processing. http://www.textgraphs.org.
    Findings
  • E. Agichtein and L. Gravano. Snowball: extracting relations from large plain-text collections. In the Proceedings of the 5th ACM Conference on Digital Libraries, San Antonio, Texas, USA, 2000.
    Google ScholarLocate open access versionFindings
  • M. J. Cafarella, A. Halevy, Z. D. Wang, E. Wu, and Y. Zhang. WebTables: Exploring the power of tables on the Web. In the Proceedings of VLDB, Auckland, New Zealand, 2008.
    Google ScholarLocate open access versionFindings
  • E. Eiloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In the Proceedings of the 16th National Conference on Artificial Intelligence, 1999.
    Google ScholarLocate open access versionFindings
  • O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderl, D. S. Weld, and E. Yates. Methods for domain-independent information extraction from the web: An experimental comparison. In the Proceedings of AAAI, 2004.
    Google ScholarLocate open access versionFindings
  • M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In the Proceedings of the 14th Conference on Computational Linguistics, 1992.
    Google ScholarLocate open access versionFindings
  • M. Komachi and H. Suzuki. Minimally supervised learning of semantic knowledge from query logs. In the Proceedings of IJCNLP, Hyderabad, India, 2008.
    Google ScholarLocate open access versionFindings
  • J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In the Proceedings of ICML, pages 282–289, 2001.
    Google ScholarLocate open access versionFindings
  • X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In the Proceedings of the 31st SIGIR Conference, 2008.
    Google ScholarLocate open access versionFindings
  • X. Li, Y.-Y. Wang, and A. Acero. Extracting structured information from user queries with semi-supervised conditional random fields. In the Proceedings of the 32nd SIGIR Conference, 2009.
    Google ScholarLocate open access versionFindings
  • D. Lin and P. Pantel. Concept discovery from text. In the Proceedings of the 19th International Conference on Computational linguistics (COLING-02), 2002.
    Google ScholarLocate open access versionFindings
  • A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and Web-enhanced lexicons. In the Proceedings of the 7th Conference on Natural Language Learning (CoNLL), Edmonton, Canada, 2003.
    Google ScholarLocate open access versionFindings
  • L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford InfoLab, 1999.
    Google ScholarFindings
  • P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In the Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, Australia, 2006.
    Google ScholarLocate open access versionFindings
  • F. Peng and A. McCallum. Accurate information extraction from research papers using conditional random fields. In the Proceedings of Human Language Technology Conference and the Conference of North American Chapter of the Association for Computational Linguistics, 2004.
    Google ScholarLocate open access versionFindings
  • S. Sarawagi and W. W. Cohen. Semi-markov conditional random fields for information extraction. In the Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, 2005.
    Google ScholarLocate open access versionFindings
  • F. Sha and F. Pereira. Shallow parsing with conditional random fields. In the Proceedings of Human Language Technology Conference and the Conference of the North American Chapter of the Association for Computational Linguistics, 2003.
    Google ScholarLocate open access versionFindings
  • P. P. Talukdar, T. Brants, M. Liberman, and F. Pereira. A context pattern induction method for named entity extraction. In the Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), New York City, 2006.
    Google ScholarLocate open access versionFindings
  • P. P. Talukdar, J. Reisinger, M. Pasca, D. Ravichandran, R. Bhagat, and F. Pereira. Weakly-supervised acquisition of labeled class instances using graph random walks. In the Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.
    Google ScholarLocate open access versionFindings
  • R. C. Wang, N. Schlaefer, W. Cohen, and E. Nyberg. Automatic set expansion for list question answering. In the Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.
    Google ScholarLocate open access versionFindings
  • Y.-Y. Wang, A. Acero, C. Chelba, B. Frey, and L. Wong. Combination of statistical and rule-based approaches for spoken language understanding. In the Proceedings of the International Conference on Speech and Language Processing, Denver, Colorado, 2002.
    Google ScholarLocate open access versionFindings
  • D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems, volume 16, pages 321–328, 2004.
    Google ScholarLocate open access versionFindings
  • D. Zhou, B. Scholkopf, and T. Hofmann. Semi-supervised learning on directed graphs. In Advances in Neural Information Processing Systems, 2005.
    Google ScholarLocate open access versionFindings
  • X. Zhu. Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University, 2005.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments