ausblenden:
Schlagwörter:
Computer Science, Learning, cs.LG
Zusammenfassung:
With the increasing reliance on deep neural networks, it is important to
develop ways to better understand their learned representations. Representation
similarity measures have emerged as a popular tool for examining learned
representations However, existing measures only provide aggregate estimates of
similarity at a global level, i.e. over a set of representations for N input
examples. As such, these measures are not well-suited for investigating
representations at a local level, i.e. representations of a single input
example. Local similarity measures are needed, for instance, to understand
which individual input representations are affected by training interventions
to models (e.g. to be more fair and unbiased) or are at greater risk of being
misclassified. In this work, we fill in this gap and propose Pointwise
Normalized Kernel Alignment (PNKA), a measure that quantifies how similarly an
individual input is represented in two representation spaces. Intuitively, PNKA
compares the similarity of an input's neighborhoods across both spaces. Using
our measure, we are able to analyze properties of learned representations at a
finer granularity than what was previously possible. Concretely, we show how
PNKA can be leveraged to develop a deeper understanding of (a) the input
examples that are likely to be misclassified, (b) the concepts encoded by
(individual) neurons in a layer, and (c) the effects of fairness interventions
on learned representations.