English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  A novel representation of protein sequences for prediction of subcellular location using support vector machines

Matsuda, S., Vert, J.-P., Saigo, H., Ueda, N., Toh, H., & Akutsu, T. (2005). A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Science, 14(11), 2804-2813. doi:10.1110/ps.051597405.

Item is

Files

show Files

Locators

show
hide
Description:
-
OA-Status:

Creators

show
hide
 Creators:
Matsuda, S, Author
Vert, J-P, Author
Saigo, H1, Author           
Ueda , N, Author
Toh, H, Author
Akutsu, T, Author
Affiliations:
1External Organizations, ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87 and 91 were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.

Keywords: subcellular location; signal sequence; amino acid composition; distance frequency; support vector machine; predictive accuracy

Details

show
hide
Language(s):
 Dates: 2005-11
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.1110/ps.051597405
BibTex Citekey: 4604
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Protein Science
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: New York, N.Y. : Cambridge University Press
Pages: - Volume / Issue: 14 (11) Sequence Number: - Start / End Page: 2804 - 2813 Identifier: ISSN: 0961-8368
CoNE: https://pure.mpg.de/cone/journals/resource/954925342760