Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA
CoRR(2024)
摘要
The efficacy of large language models (LLMs) in domain-specific medicine,
particularly for managing complex diseases such as osteoarthritis (OA), remains
largely unexplored. This study focused on evaluating and enhancing the clinical
capabilities of LLMs in specific domains, using osteoarthritis (OA) management
as a case study. A domain specific benchmark framework was developed, which
evaluate LLMs across a spectrum from domain-specific knowledge to clinical
applications in real-world clinical scenarios. DocOA, a specialized LLM
tailored for OA management that integrates retrieval-augmented generation (RAG)
and instruction prompts, was developed. The study compared the performance of
GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human
evaluations. Results showed that general LLMs like GPT-3.5 and GPT-4 were less
effective in the specialized domain of OA management, particularly in providing
personalized treatment recommendations. However, DocOA showed significant
improvements. This study introduces a novel benchmark framework which assesses
the domain-specific abilities of LLMs in multiple aspects, highlights the
limitations of generalized LLMs in clinical contexts, and demonstrates the
potential of tailored approaches for developing domain-specific medical LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要