BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers
arxiv(2024)
摘要
Developing effective biomedical retrieval models is important for excelling
at knowledge-intensive biomedical tasks but still challenging due to the
deficiency of sufficient publicly annotated biomedical data and computational
resources. We present BMRetriever, a series of dense retrievers for enhancing
biomedical retrieval via unsupervised pre-training on large biomedical corpora,
followed by instruction fine-tuning on a combination of labeled datasets and
synthetic pairs. Experiments on 5 biomedical tasks across 11 datasets verify
BMRetriever's efficacy on various biomedical applications. BMRetriever also
exhibits strong parameter efficiency, with the 410M variant outperforming
baselines up to 11.7 times larger, and the 2B variant matching the performance
of models with over 5B parameters. The training data and model checkpoints are
released at to ensure transparency,
reproducibility, and application to new domains.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要