English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  CogBench: a large language model walks into a psychology lab

Coda-Forno, J., Binz, M., Wang, J., & Schulz, E. (submitted). CogBench: a large language model walks into a psychology lab.

Item is

Files

show Files

Locators

show
hide
Locator:
https://arxiv.org/pdf/2402.18225.pdf (Any fulltext)
Description:
-
OA-Status:
Not specified

Creators

show
hide
 Creators:
Coda-Forno, J1, Author           
Binz, M1, Author                 
Wang, JX, Author
Schulz, E1, Author                 
Affiliations:
1Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3189356              

Content

show
hide
Free keywords: -
 Abstract: Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predominant focus on performance metrics in most benchmarks. This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. This novel approach offers a toolkit for phenotyping LLMs' behavior. We apply CogBench to 35 LLMs, yielding a rich and diverse dataset. We analyze this data using statistical multilevel modeling techniques, accounting for the nested dependencies among fine-tuned versions of specific LLMs. Our study highlights the crucial role of model size and reinforcement learning from human feedback (RLHF) in improving performance and aligning with human behavior. Interestingly, we find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior. Finally, we explore the effects of prompt-engineering techniques. We discover that chain-of-thought prompting improves probabilistic reasoning, while take-a-step-back prompting fosters model-based behaviors.

Details

show
hide
Language(s):
 Dates: 2024-02
 Publication Status: Submitted
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.48550/arXiv.2402.18225
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show