Improving protein-protein interaction article classification using biological domain knowledge

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS(2015)

引用 0|浏览0
暂无评分
摘要
Interaction Article Classification IAC is a specific text classification application in biological domain that tries to find out which articles describe Protein-Protein Interactions PPIs to help extract PPIs from biological literature more efficiently. However, the existing text representation and feature weighting schemes commonly used for text classification are not well suited for IAC. We capture and utilise biological domain knowledge, i.e. gene mentions also known as protein or gene names in the articles, to address the problem. We put forward a new gene mention order-based approach that highlights the important role of gene mentions to represent the texts. Furthermore, we also incorporate the information concerning gene mentions into a novel feature weighting scheme called Gene Mention-based Term Frequency GMTF. By conducting experiments, we show that using the proposed representation and weighting schemes, our Interaction Article Classifier IACer performs better than other leading systems for the moment.
更多
查看译文
关键词
text classification, protein-protein interaction, feature weighting, biological domain knowledge, data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要