English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone

Ruiz-Blanco, Y. B., Agüero-Chapin, G., García-Hernández, E., Álvarez, O., Antunes, A., & Green, J. (2017). Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinformatics, 18: 349. doi:10.1186/s12859-017-1758-x.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Ruiz-Blanco, Yasser B.1, 2, Author           
Agüero-Chapin, Guillermin3, 4, 5, Author
García-Hernández, Enrique6, Author
Álvarez, Orlando4, Author
Antunes, Agostinho3, 5, Author
Green, James7, Author
Affiliations:
1Facultad de Química y Farmacia, Universidad Central “Marta Abreu” de Las Villas, 54830 Santa Clara, Cuba, ou_persistent22              
2Research Group Sánchez-García, Max-Planck-Institut für Kohlenforschung, Max Planck Society, ou_1950289              
3CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal , ou_persistent22              
4Centro de Bioactivos Químicos (CBQ), Universidad Central ̈Marta Abreu ̈ de Las Villas (UCLV), 54830 Santa Clara, Cuba, ou_persistent22              
5Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, , Rua do Campo Alegre, 4169-007 Porto, Portugal, ou_persistent22              
6Instituto de Química, Universidad Nacional Autónoma de México (UNAM), 360 D.F, México, Mexico , ou_persistent22              
7Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada, ou_persistent22              

Content

show
hide
Free keywords: Enzyme, Alignment-free protein analysis, Protein descriptors, Support vector machines, ProtDCal, TI2BioP
 Abstract: Background Computational prediction of protein function constitutes one of the more complex problems in Bioinformatics, because of the diversity of functions and mechanisms in that proteins exert in nature. This issue is reinforced especially for proteins that share very low primary or tertiary structure similarity to existing annotated proteomes. In this sense, new alignment-free (AF) tools are needed to overcome the inherent limitations of classic alignment-based approaches to this issue. We have recently introduced AF protein-numerical-encoding programs (TI2BioP and ProtDCal), whose sequence-based features have been successfully applied to detect remote protein homologs, post-translational modifications and antibacterial peptides. Here we aim to demonstrate the applicability of 4 AF protein descriptor families, implemented in our programs, for the identification enzyme-like proteins. At the same time, the use of our novel family of 3D–structure-based descriptors is introduced for the first time. The Dobson & Doig (D&D) benchmark dataset is used for the evaluation of our AF protein descriptors, because of its proven structural diversity that permits one to emulate an experiment within the twilight zone of alignment-based methods (pair-wise identity <30%). The performance of our sequence-based predictor was further assessed using a subset of formerly uncharacterized proteins which currently represent a benchmark annotation dataset. Results Four protein descriptor families (sequence-composition-based (0D), linear-topology-based (1D), pseudo-fold-topology-based (2D) and 3D–structure features (3D), were assessed using the D&D benchmark dataset. We show that only the families of ProtDCal’s descriptors (0D, 1D and 3D) encode significant information for enzymes and non-enzymes discrimination. The obtained 3D–structure-based classifier ranked first among several other SVM-based methods assessed in this dataset. Furthermore, the model leveraging 1D descriptors, showed a higher success rate than EzyPred on a benchmark annotation dataset from the Shewanella oneidensis proteome. Conclusions The applicability of ProtDCal as a general-purpose-AF protein modelling method is illustrated through the discrimination between two comprehensive protein functional classes. The observed performances using the highly diverse D&D dataset, and the set of formerly uncharacterized (hard-to-annotate) proteins of Shewanella oneidensis, places our methodology on the top range of methods to model and predict protein function using alignment-free approaches.

Details

show
hide
Language(s): eng - English
 Dates: 2017-02-072017-07-132017-07-21
 Publication Status: Published online
 Pages: 14
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.1186/s12859-017-1758-x
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: BMC Bioinformatics
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: London, UK : BioMed Central
Pages: - Volume / Issue: 18 Sequence Number: 349 Start / End Page: - Identifier: ISSN: 1471-2105
CoNE: https://pure.mpg.de/cone/journals/resource/111000136905000