日本語
 
Help Privacy Policy ポリシー/免責事項
  詳細検索ブラウズ

アイテム詳細

登録内容を編集
  このアイテムは取り下げられました。リリース履歴を表示詳細要約

取り下げ

Preprint

Sensitive clustering of protein sequences at tree-of-life scale using DIAMOND DeepClust

MPS-Authors
/persons/resource/persons271373

Buchfink,  B       
Department Molecular Biology, Max Planck Institute for Biology Tübingen, Max Planck Society;
Computational Biology Group, Department Molecular Biology, Max Planck Institute for Biology Tübingen, Max Planck Society;

/persons/resource/persons271598

Ashkenazy,  H       
Department Molecular Biology, Max Planck Institute for Biology Tübingen, Max Planck Society;

/persons/resource/persons271796

Drost,  H-G       
Department Molecular Biology, Max Planck Institute for Biology Tübingen, Max Planck Society;
Computational Biology Group, Department Molecular Biology, Max Planck Institute for Biology Tübingen, Max Planck Society;

External Resource
There are no locators available
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
フルテキスト (公開)
公開されているフルテキストはありません
付随資料 (公開)
There is no public supplementary material available
引用

Buchfink, B., Ashkenazy, H., Reuter, K., Kennedy, J., & Drost, H.-G. (submitted). Sensitive clustering of protein sequences at tree-of-life scale using DIAMOND DeepClust.


要旨
The biosphere genomics era is transforming life science research, but existing methods struggle to efficiently reduce the vast dimensionality of the protein universe. We present DIAMOND DeepClust, an ultra-fast cascaded clustering method optimized to cluster the 19 billion protein sequences currently defining the protein biosphere. As a result, we detect 1.7 billion clusters of which 32% hold more than one sequence. This means that 544 million clusters represent 94% of all known proteins, illustrating that clustering across the tree of life can significantly accelerate comparative studies in the Earth BioGenome era.