Benchmarking large language models for bio-image analysis code generation

Haase, Robert; Tischer, Christian; Scherf, Nico

doi:10.1101/2024.04.19.590278

Local TagsRelease HistoryDetailsSummary

Benchmarking large language models for bio-image analysis code generation

Haase, R., Tischer, C., & Scherf, N. (2024). Benchmarking large language models for bio-image analysis code generation. bioRxiv. doi:10.1101/2024.04.19.590278.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000F-2FC0-4 Version Permalink: https://hdl.handle.net/21.11116/0000-000F-3AAD-E

Genre: Preprint

Files

show Files

hide Files

:

Haase_pre_v2.pdf (Preprint), 3MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-000F-3AAE-D

Name:
Haase_pre_v2.pdf

Description:
-

OA-Status:
Green

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
https://creativecommons.org/licenses/by/4.0/

Locators

show

Creators

show

hide

Creators:
Haase, Robert, Author
Tischer, Christian, Author
Scherf, Nico¹, Author

Affiliations:
1Method and Development Group Neural Data Science and Statistical Computing, MPI for Human Cognitive and Brain Sciences, Max Planck Society, ou_3282987

Content

show

hide

Free keywords: -

Abstract: In the computational age, life-scientists often have to write Python code to solve bio-image analysis (BIA) problems. Many of them have not been formally trained in programming though. Code-generation, or coding assistance in general, with Large Language Models (LLMs) can have a clear impact on BIA. To the best of our knowledge, the quality of the generated code in this domain has not been studied. We present a quantitative benchmark to estimate the capability of LLMs to generate code for solving common BIA tasks. Our benchmark currently consists of 57 human-written prompts with corresponding reference solutions in Python, and unit-tests to evaluate functional correctness of potential solutions. We demonstrate our benchmark here and compare 6 state-of-the-art LLMs. To ensure that we will cover most of our community needs we also outline mid- and long-term strategies to maintain and extend the benchmark by the BIA open-source community. This work should support users in deciding for an LLM and also guide LLM developers in improving the capabilities of LLMs in the BIA domain.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2024-04-25

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1101/2024.04.19.590278

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: bioRxiv

Source Genre: Web Page

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: - Identifier: -