Mining GO Annotations for Improving Annotation Consistency

Faria, Daniel; Schlicker, Andreas; Pesquita, Catia; Bastos, Hugo; Ferreira, António E. N.; Albrecht, Mario; Falcao, André O.

doi:10.1371/journal.pone.0040519

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Mining GO Annotations for Improving Annotation Consistency

MPS-Authors

/persons/resource/persons45392

Schlicker, Andreas
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society;

/persons/resource/persons43993

Albrecht, Mario
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society;

External Resource

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3405096/
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

journal.pone.0040519.pdf
(Publisher version), 92KB

Supplementary Material (public)

There is no public supplementary material available

Citation

Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E. N., Albrecht, M., et al. (2012). Mining GO Annotations for Improving Annotation Consistency. PLoS One, 7(7): e40519, pp.,1-7. doi:10.1371/journal.pone.0040519.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0014-C9B0-F

Abstract

Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.