Extending language modeling techniques to models of search and browsing activity in a digital library.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval(2008)

引用 0|浏览15
暂无评分
摘要
Users searching for information in a digital library or on the WWW can be modeled as individuals moving through a semantic space by issuing queries and clicking on hyperlinks. As they go, they emit a stream of interaction data. Most of it is linguistic data. Lots of it is captured in logs. Some of it is used to guess what the user is searching for. But to most information retrieval systems, each user interaction is a stateless point in this space. There is a timeline connecting each of these points, but systems seldom make use of this as sequence data, in part because there is no clear way to systematically characterize the meaningful relations within a sequence of user activity. It is a problem of pragmatics as much as it is of semantics--the fact that a user clicked on a particular link, or added a particular term to their query, has meaning primarily in relation to the preceding actions. A remaining challenge in IR is to extract features of the user interaction data that will give meaning to those relations. Meanwhile, from the user's perspective each of these points in time and semantic space are just part of a path of exploration. To the user, the exact terms in a query, or the specific words surrounding a hypertext link, may be less important than the trajectory those terms establish in relation to the user's path. Identifying the meaningful relations between queries and page views within a sequence of activity increases our understanding of users and their information needs. Formally, we can model query and browsing behaviors as surface forms of a hidden process. What is missing is a layer of abstraction for mapping sequences of interaction in a way that is both descriptive of users' needs and useful to automation. The work I describe is an effort to identify features of data in logs of query and browsing activity that are highly predictive of certain types of behavior. Sequences of interaction data from individual users are modeled as sequences of expression. Statistical modeling techniques that are effective for modeling sequences in natural language processing and bioinformatics are examined for their ability to model sequences of interaction between an information searcher and an information retrieval system. Queries and click-throughs in this stream of interaction can be tagged with features such as semantic coordinates, timing, frequency of use, type of action, etc. By analyzing large collections of interaction sequences it is possible to identify frequent patterns of user behavior. From these patterns we can make predictions about future interactions. For example, certain patterns of link following in a digital library are highly predictive of users' next steps while other patterns are not. General models of user interaction are useful for design and evaluation of search interfaces. Individual models of user interaction are useful for personalized search and customized content. Yet very little research has been done to investigate which features are optimal for modeling user queries and browsing as interaction sequences. An important first step is to identify informative features and the relationships between features. I propose to construct models of user behavior based on user data in logs of query and browsing activity and to identify features that are highly predictive of certain types of user behaviors. I examine activity within search sessions on a digital library as a microcosm of larger systems. I expect to find features that are useful in predictive models of user behavior both at an individual and aggregate level. Where possible, I hope to identify meaningful relationships between those features. The work has implications beyond the scope of digital libraries, to larger systems and broader search domains.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要