English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

POIMs: positional oligomer importance matrices: understanding support vector machine-based signal detectors

MPS-Authors
/persons/resource/persons84331

Zien,  A
Max Planck Institute for Biological Cybernetics, Max Planck Society;
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Friedrich Miescher Laboratory, Max Planck Society;

Philips ,  P
Friedrich Miescher Laboratory, Max Planck Society;

/persons/resource/persons84153

Rätsch,  G
Friedrich Miescher Laboratory, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Sonnenburg, S., Zien, A., Philips, P., & Rätsch, G. (2008). POIMs: positional oligomer importance matrices: understanding support vector machine-based signal detectors. Bioinformatics, 24(13), i6-i14.


Cite as: https://hdl.handle.net/21.11116/0000-0003-3034-C
Abstract
Motivation: At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences. Frequently the most accurate classifiers are obtained by training support vector machines (SVMs) with complex sequence kernels. However, a cumbersome shortcoming of SVMs is that their learned decision rules are very hard to understand for humans and cannot easily be related to biological facts.

Results: To make SVM-based sequence classifiers more accessible and profitable, we introduce the concept of positional oligomer importance matrices (POIMs) and propose an efficient algorithm for their computation. In contrast to the raw SVM feature weighting, POIMs take the underlying correlation structure of k-mer features induced by overlaps of related k-mers into account. POIMs can be seen as a powerful generalization of sequence logos: they allow to capture and visualize sequence patterns that are relevant for the investigated biological phenomena.