Towards semantic-driven boolean query formalization for biomedical systematic literature reviews.

International journal of medical informatics(2022)

引用 1|浏览24
暂无评分
摘要
OBJECTIVE:Study identification refers to formalizing an effective search over biomedical databases for retrieving all eligible evidence for a systematic review. Manual construction of queries, where a user submit a search query for which a biomedical search system such as PubMed would identify the most relevant documents, has been recognized as a very costly step in conducting systematic reviews. The objective of this paper is to present an automatic query generation approach to reduce the time and labor cost of manual biomedical study identification. MATERIALS AND METHODS:The evaluation benchmark is the widely adopted CLEF 2018 Technology Assisted Reviews (TAR) collection, with 72 systematic reviews on Diagnosis Test Accuracy. We use and fine-tune pre-trained language models for generating high-level key-phrases and their dense embeddings. We constructed and published a dataset consists of almost one million PubMed articles' abstracts and their keywords for fine-tuning pre-trained language models. We also use concepts that are represented in the Unified Medical Language System, UMLS, for query expansion and embedding generation. We exploit and test different clustering methods, namely Agglomerative clustering, Affinity Propagation, and K-Means, over the generated embeddings to form query clauses. RESULTS:Our proposed methods outperform existing state-of-the-art automatic query generation models across Precision (0.0821 compared with 0.005), Recall (0.9676 compared with 0.878), and F-measures (0.2898 compared with 0.0356 in F3 measure). In addition, some of the proposed methods can even outperform the performance of the manually crafted queries in some specific measures. CONCLUSION:The proposed model in this paper can be utilized to form an effective initial search query that can be further refined and updated by human reviewers for achieving the desired performance. For future work, we would like to explore the application of the presented query formalization methods in existing study identification methodologies and techniques, especially those that iteratively train machine learning models based on the domain experts' feedback on the relevancy of the retrieved studies.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要