Natural similarity measures between position frequency matrices with an 
application to clustering

Pape, Utz J.; Rahmann, Sven; Vingron, Martin

Local TagsRelease HistoryDetailsSummary

Natural similarity measures between position frequency matrices with an application to clustering

Pape, U. J., Rahmann, S., & Vingron, M. (2008). Natural similarity measures between position frequency matrices with an application to clustering. Bioinformatics, 24(3), 350-357. Retrieved from http://bioinformatics.oxfordjournals.org/cgi/reprint/24/3/350.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0010-80A3-C Version Permalink: https://hdl.handle.net/11858/00-001M-0000-0010-80A4-A

Genre: Journal Article

Files

show Files

hide Files

:

350.pdf (Any fulltext), 216KB

View Save

File Permalink:
https://hdl.handle.net/11858/00-001M-0000-0010-80A2-E

Name:
350.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
eDoc_access: PUBLIC

License:
-

Locators

show

Creators

show

hide

Creators:
Pape, Utz J.¹, Author
Rahmann, Sven¹, Author
Vingron, Martin², Author

Affiliations:
1Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1433547
2Gene regulation (Martin Vingron), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479639

Content

show

hide

Free keywords: -

Abstract: Motivation: Transcription factors (TFs) play a key role in gene regulation by binding to target sequences. In silico prediction of potential binding of a TF to a binding site is a well-studied problem in computational biology. The binding sites for one TF are represented by a position frequency matrix (PFM). The discovery of new PFMs requires the comparison to known PFMs to avoid redundancies. In general, two PFMs are similar if they occur at overlapping positions under a null model. Still, most existing methods compute similarity according to probabilistic distances of the PFMs. Here we propose a natural similarity measure based on the asymptotic covariance between the number of PFM hits incorporating both strands. Furthermore, we introduce a second measure based on the same idea to cluster a set of the Jaspar PFMs. Results: We show that the asymptotic covariance can be efficiently computed by a two dimensional convolution of the score distributions. The asymptotic covariance approach shows strong correlation with simulated data. It outperforms three alternative methods. The Jaspar clustering yields distinct groups of TFs of the same class. Furthermore, a representative PFM is given for each class. In contrast to most other clustering methods, PFMs with low similarity automatically remain singletons. Availability: A website to compute the similarity and to perform clustering, the source code and Supplementary Material are available at http://mosta.molgen.mpg.de

Details

show

hide

Language(s): eng - English

Dates: Date issued: 2008-01-02

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: eDoc: 404466
URI: http://bioinformatics.oxfordjournals.org/cgi/reprint/24/3/350
URI: 10.1093/bioinformatics/btm610

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Bioinformatics

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: 24 (3) Sequence Number: - Start / End Page: 350 - 357 Identifier: ISSN: 1367-4803