Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective
IEEE Transactions on Knowledge and Data Engineering(2023)
摘要
Molecule discovery plays a crucial role in various scientific fields,
advancing the design of tailored materials and drugs. However, most of the
existing methods heavily rely on domain experts, require excessive
computational cost, or suffer from sub-optimal performance. On the other hand,
Large Language Models (LLMs), like ChatGPT, have shown remarkable performance
in various cross-modal tasks due to their powerful capabilities in natural
language understanding, generalization, and in-context learning (ICL), which
provides unprecedented opportunities to advance molecule discovery. Despite
several previous works trying to apply LLMs in this task, the lack of
domain-specific corpus and difficulties in training specialized LLMs still
remain challenges. In this work, we propose a novel LLM-based framework
(MolReGPT) for molecule-caption translation, where an In-Context Few-Shot
Molecule Learning paradigm is introduced to empower molecule discovery with
LLMs like ChatGPT to perform their in-context learning capability without
domain-specific pre-training and fine-tuning. MolReGPT leverages the principle
of molecular similarity to retrieve similar molecules and their text
descriptions from a local database to enable LLMs to learn the task knowledge
from context examples. We evaluate the effectiveness of MolReGPT on
molecule-caption translation, including molecule understanding and text-based
molecule generation. Experimental results show that compared to fine-tuned
models, MolReGPT outperforms MolT5-base and is comparable to MolT5-large
without additional training. To the best of our knowledge, MolReGPT is the
first work to leverage LLMs via in-context learning in molecule-caption
translation for advancing molecule discovery. Our work expands the scope of LLM
applications, as well as providing a new paradigm for molecule discovery and
design.
更多查看译文
关键词
Drug Discovery,Large Language Models (LLMs),In-context Learning,Retrieval Augmented Generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要