SIMBSIG: similarity search and clustering for biobank-scale data

Adamer, Michael F.; Roellin, Eljas; Bourguignon, Lucie; Borgwardt, Karsten

doi:10.1093/bioinformatics/btac829

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

SIMBSIG: similarity search and clustering for biobank-scale data

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Adamer, M. F., Roellin, E., Bourguignon, L., & Borgwardt, K. (2022). SIMBSIG: similarity search and clustering for biobank-scale data. Bioinformatics, 39(1): btac829. doi:10.1093/bioinformatics/btac829.

Cite as: https://hdl.handle.net/21.11116/0000-000C-EC54-C

Abstract

In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis.SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license.