Exploiting Search Logs to Aid in Training and Automating Infrastructure for Question Answering in Professional Domains

Filippo Pompili,Jack G. Conrad†, Carter Kolbeck

ICAIL '19: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law（2019）

引用 2|浏览4

暂无评分

摘要

Developing an AI question answering system for the legal and regulatory domain requires significant ground truth annotations for what constitutes a good answer for a given question. Collecting these annotations from qualified legal and regulatory professionals is time consuming and expensive. By making use of user activity data from the query logs of existing legal and regulatory search engines, it is possible to speed up the annotation collection process as well as supplement annotations with imputed labels. We used signals from user activity logs indicating that a user affirmatively engaged with an answer after entering a query. We leveraged these signals to infer suitable answers to questions without needing to rely on annotators. In previous research efforts, such identification was known as Implicit Relevance Feedback (IRF). Our investigations have determined that 90% of our IRF candidates contain either a complete or partial answer. Given such an elevated baseline, the next phase of this project involved harvesting data derived from such IRF (we've termed it "silver data" in contrast to expert-annotated "gold data") and extending the process to significantly larger sets of data. We examine how the approach affects performance ranging from zero, and very low amounts of gold data to substantially higher amounts of gold data. Such efforts can result in producing appreciably more reliable amounts of training data for next generation QA systems as well as establishing the means to automate the infrastructure that supports such systems. We investigate the impact of including silver data alongside gold data on the performance of a QA system. Specifically: how does silver data impact the cold start challenge (when no gold data exists initially), how much gold data is needed to achieve comparable performance to a model trained on a given amount of silver data, and what performance gains can be realized by introducing silver data to graduated amounts of gold data? We show that leveraging silver data can establish a preliminary QA system in the absence of gold data, and boost the system's performance once the gold data workstream is in place. We further show the relative efficacy of silver data to gold data by conducting performance comparisons for models trained on varying ratios of each type of data.

查看译文

关键词

Data Analysis, Data Mining, Evaluation, Legal Applications, Query Log Analysis

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要