Support Vector Machines and Kernels for Computational Biology

Ben-Hur, A; Ong, CS; Sonnenburg, S; Schölkopf, B; Rätsch, G

doi:10.1371/journal.pcbi.1000173

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Support Vector Machines and Kernels for Computational Biology

MPS-Authors

/persons/resource/persons84118

Ong, CS
Rätsch Group, Friedrich Miescher Laboratory, Max Planck Society;

/persons/resource/persons84153

Rätsch, G
Rätsch Group, Friedrich Miescher Laboratory, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Ben-Hur, A., Ong, C., Sonnenburg, S., Schölkopf, B., & Rätsch, G. (2008). Support Vector Machines and Kernels for Computational Biology. PLoS Computational Biology, 4(10): e1000173. doi:10.1371/journal.pcbi.1000173.

Cite as: https://hdl.handle.net/21.11116/0000-000A-5FF5-8

Abstract

The increasing wealth of biological data coming from a large variety of platforms and the continued development of new high-throughput methods for probing biological systems require increasingly more sophisticated computational approaches. Putting all these data in simple-to-use databases is a first step; but realizing the full potential of the data requires algorithms that automatically extract regularities from the data, which can then lead to biological insight.

Many of the problems in computational biology are in the form of prediction: starting from prediction of a gene's structure, prediction of its function, interactions, and role in disease. Support vector machines (SVMs) and related kernel methods are extremely good at solving such problems [1]–[3]. SVMs are widely used in computational biology due to their high accuracy, their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data [2], [4]–[6].

The simplest form of a prediction problem is binary classification: trying to discriminate between objects that belong to one of two categories—positive (+1) or negative (−1). SVMs use two key concepts to solve this problem: large margin separation and kernel functions. The idea of large margin separation can be motivated by classification of points in two dimensions (see Figure 1). A simple way to classify the points is to draw a straight line and call points lying on one side positive and on the other side negative. If the two sets are well separated, one would intuitively draw the separating line such that it is as far as possible away from the points in both sets (see Figures 2 and 3). This intuitive choice captures the idea of large margin separation, which is mathematically formulated in the section Classification with Large Margin.