Inducing anxiety in large language models increases exploration and bias

Coda-Forno, J; Witte, K; Jagadish, AK; Binz, M; Akata, Z; Schulz, E

doi:10.48550/arXiv.2304.11111

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Preprint

Inducing anxiety in large language models increases exploration and bias

MPS-Authors

/persons/resource/persons276874

Coda-Forno, J
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons252844

Witte, K
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons252796

Jagadish, AK
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons256660

Binz, M
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons139782

Schulz, E
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://arxiv.org/pdf/2304.11111
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Coda-Forno, J., Witte, K., Jagadish, A., Binz, M., Akata, Z., & Schulz, E. (submitted). Inducing anxiety in large language models increases exploration and bias.

Cite as: https://hdl.handle.net/21.11116/0000-000D-0AB0-1

Abstract

Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.