Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture 
Models

Dubey, A; Hwang, S; Rangel, C; Rasmussen, CE; Ghahramani, Z; Wild, DL

Local TagsRelease HistoryDetailsSummary

Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models

Dubey, A., Hwang, S., Rangel, C., Rasmussen, C., Ghahramani, Z., & Wild, D. (2004). Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models. In Pacific Symposium on Biocomputing (PSB 2004) (pp. 399-410). Singapore: World Scientific Publishing.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-F3A7-5 Version Permalink: https://hdl.handle.net/21.11116/0000-0005-53AE-A

Genre: Conference Paper

Files

show Files

hide Files

:

pdf2373.pdf (Any fulltext), 182KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0005-53AF-9

Name:
pdf2373.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

hide

Locator:
http://psb.stanford.edu/previous/psb04/ (Table of contents) Open Access status unknown

Description:
-

OA-Status:

Creators

show

hide

Creators:
Dubey, A, Author
Hwang, S, Author
Rangel, C, Author
Rasmussen, CE^{1, 2}, Author
Ghahramani, Z, Author
Wild, DL, Author

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497794

Content

show

hide

Free keywords: -

Abstract: We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the thoery of infinite Gaussian mixture models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illustrate our methods with application to three data sets: globin sequences, globin sequences with known tree-dimensional structures and G-pretein coupled receptor sequences. The consistency of the clusters indicate that that our methods is producing biologically meaningful results, which provide a very good indication of the underlying families and subfamilies. With the inclusion of secondary structure and residue solvent accessibility information, we obtain a classification of sequences of known structure which reflects and extends their SCOP classifications.

A supplementary web site containing larger versions of the figures is available at http://public.kgi.edu/~wild/PSB04

Details

show

hide

Language(s):

Dates: Date issued: 2004-01

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: BibTex Citekey: 2373

Degree: -

Event

show

hide

Title: Pacific Symposium on Biocomputing (PSB 2004)

Place of Event: Waimea, HI, USA

Start-/End Date: 2004-01-06 - 2004-01-10

Legal Case

show

Project information

show

Source 1

show

hide

Title: Pacific Symposium on Biocomputing (PSB 2004)

Source Genre: Proceedings

Creator(s):

Affiliations:

Publ. Info: Singapore : World Scientific Publishing

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 399 - 410 Identifier: -