English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Thesis

Increasing Interpretability of Deep Neural Networks via B-cosification

MPS-Authors
/persons/resource/persons301781

Arya,  Shreyash
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Arya, S. (2024). Increasing Interpretability of Deep Neural Networks via B-cosification. Master Thesis, Universität des Saarlandes, Saarbrücken.


Cite as: https://hdl.handle.net/21.11116/0000-000F-D77A-6
Abstract
Understanding the decisions of deep neural networks (DNNs) has been a challenging task due to their ‘black-box’ nature. Methods such as feature attributions that attempt to explain the decisions of such models post-hoc, while popular, have been shown to often yield explanations that are not faithful to the model. Recently, B-cos networks were proposed as a means of instead designing such networks to be inherently interpretable by architecturally enforcing stronger alignment between inputs and weights, yielding highly human interpretable explanations that are model-faithful by design. However, unlike with post-hoc methods, this requires training new models from scratch, which represents a major hurdle for establishing such novel models as an alternative to existing ones, in particular due to the increasing reliance on large,pre-trained foundational models. In this work, inspired by the architectural similarities in standard DNNs and B-cos networks, we propose ‘B-cosification’, a novel approach to transform existing pre-trained models to become inherently interpretable. We perform a thorough study of design choices to perform this conversion, both for convolutional neural networks and vision transformers. We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability,while often outperforming them in terms of classification performance at a fraction of the training cost. Subsequently, we apply B-cosification to CLIP models, and show that, even with limited data and compute cost, we obtain B-cosified CLIP models that are highly interpretable and are competitive on zero shot and linear probe performance across a variety of datasets.