Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Journal Article

SIMBSIG: similarity search and clustering for biobank-scale data

There are no MPG-Authors in the publication available
External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Adamer, M. F., Roellin, E., Bourguignon, L., & Borgwardt, K. (2022). SIMBSIG: similarity search and clustering for biobank-scale data. Bioinformatics, 39(1): btac829. doi:10.1093/bioinformatics/btac829.

Cite as: https://hdl.handle.net/21.11116/0000-000C-EC54-C
In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis.SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license.