A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
arxiv(2023)
摘要
We evaluate four state-of-the-art instruction-tuned large language models
(LLMs) – ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca – on a set of 13
real-world clinical and biomedical natural language processing (NLP) tasks in
English, such as named-entity recognition (NER), question-answering (QA),
relation extraction (RE), etc. Our overall results demonstrate that the
evaluated LLMs begin to approach performance of state-of-the-art models in
zero- and few-shot scenarios for most tasks, and particularly well for the QA
task, even though they have never seen examples from these tasks before.
However, we observed that the classification and RE tasks perform below what
can be achieved with a specifically trained model for the medical field, such
as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all
the studied tasks, with some models being better suited for certain tasks than
others.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要