Automatic assignment of protein function with supervised classifiers

Automatic assignment of protein function with supervised classifiers(2008)

引用 23|浏览2
暂无评分
摘要
High-throughput genome sequencing and sequence analysis technologies have created the need for automated annotation and analysis of large sets of genes. The Gene Ontology (GO) provides a common controlled vocabulary for describing gene function. However, the process for annotating proteins with GO terms is usually through a tedious manual curation process by trained professional annotators. With the wealth of genomic data that are now available, there is a need for accurate automated annotation methods. The overall objective of my research is to improve our ability to automatically annotate proteins with GO terms. The first method, Automatic Annotation of Protein Functional Class (AAPFC), employs protein functional domains as features and learns independent Support Vector Machine classifiers for each GO term. This approach relies only on protein functional domains as features, and demonstrates that statistical pattern recognition can outperform expert curated mapping of protein functional domain features to protein functions. The second method Predict of Gene Ontology (PoGO) describes a meta-classification method that integrates multiple heterogeneous data sources. This method leads to improved performance than the protein domain method can achieve alone. Apart from these two methods, several systems have been developed that employ pattern recognition to assign gene function using a variety of features, such as the sequence similarity, presence of protein functional domains and gene expression patterns. Most of these approaches have not considered the hierarchical relationships among the terms in the form of a directed acyclic graph (DAG). The DAG represents the functional relationships between the GO terms, thus it should be an important component of an automated annotation system. I describe a Bayesian network used as a multi-layered classifier that incorporates the relationships among GO terms found in the GO DAG. I also describe an inference algorithm for quickly assigning GO terms to unlabeled proteins. A comparative analysis of the method to other previously described annotation systems shows that the method provides improved annotation accuracy when the performance of individual GO terms are compared. More importantly, this method enables the classification of significantly more GO terms to more proteins than was previously possible.
更多
查看译文
关键词
automatic assignment,annotating protein,Gene Ontology,protein function,protein functional domain,method Predict,meta-classification method,accurate automated annotation method,supervised classifier,gene function,annotate protein,protein domain method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要