Mismatch string kernels for discriminative protein classification

Leslie, CS; Eskin, E; Cohen, A; Weston, J; Noble, WS

doi:10.1093/bioinformatics/btg431

Lokale TagsFreigabegeschichteDetailsÜbersicht

Mismatch string kernels for discriminative protein classification

Leslie, C., Eskin, E., Cohen, A., Weston, J., & Noble, W. (2004). Mismatch string kernels for discriminative protein classification. Bioinformatics, 20(4), 467-476. doi:10.1093/bioinformatics/btg431.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-0005-4F63-4 Versions-Permalink: https://hdl.handle.net/21.11116/0000-0005-4F64-3

Genre: Zeitschriftenartikel

Dateien

einblenden: Dateien

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
https://academic.oup.com/bioinformatics/article-pdf/20/4/467/476867/btg431.pdf (Verlagsversion) Open Access Status unbekannt

Beschreibung:
-

OA-Status:

Urheber

einblenden:

ausblenden:

Urheber:
Leslie, CS, Autor
Eskin, E, Autor
Cohen, A, Autor
Weston, J^{1, 2}, Autor
Noble, WS, Autor

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns.

Results: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Erschienen: 2004-03

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: DOI: 10.1093/bioinformatics/btg431

Art des Abschluß: -

ausblenden:

Titel: Bioinformatics

Genre der Quelle: Zeitschrift

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: Oxford : Oxford University Press

Seiten: - Band / Heft: 20 (4) Artikelnummer: - Start- / Endseite: 467 - 476 Identifikator: ISSN: 1367-4803
CoNE: https://pure.mpg.de/cone/journals/resource/954926969991

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1