Predicting Anti-microbial Resistance using Large Language Models
CoRR(2023)
摘要
During times of increasing antibiotic resistance and the spread of infectious
diseases like COVID-19, it is important to classify genes related to antibiotic
resistance. As natural language processing has advanced with transformer-based
language models, many language models that learn characteristics of nucleotide
sequences have also emerged. These models show good performance in classifying
various features of nucleotide sequences. When classifying nucleotide
sequences, not only the sequence itself, but also various background knowledge
is utilized. In this study, we use not only a nucleotide sequence-based
language model but also a text language model based on PubMed articles to
reflect more biological background knowledge in the model. We propose a method
to fine-tune the nucleotide sequence language model and the text language model
based on various databases of antibiotic resistance genes. We also propose an
LLM-based augmentation technique to supplement the data and an ensemble method
to effectively combine the two models. We also propose a benchmark for
evaluating the model. Our method achieved better performance than the
nucleotide sequence language model in the drug resistance class prediction.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要