hide
Free keywords:
-
Abstract:
The appearance of objects in an image can change dramatically depending on their pose,
distance, and illumination. Learning representations that are invariant against such appearance
changes can be viewed as an important preprocessing step which removes distracting
variance from a data set, so that downstream classifiers or regression estimators perform
better. Complex cells in primary visual cortex are commonly seen as building blocks for such
invariant image representations (e.g. Riesenhuber Poggio 2000). While complex cells, like
simple cells, respond to edges of particular orientation they are less sensitive to the precise
location of the edge. A variety of neural algorithms have been proposed that aim at
explaining the response properties of complex cells as components of an invariant representation
that is optimized for the spatio-temporal statistics of the visual input. For certain
classes of transformations (e.g. translations, scalings, and rotations), it is possible to analytically
derive features that are invariant under these transformations, and the design of such
invariant features has been studied extensively in computer vision. The range of naturally
occurring transformations, however, is much more variable and not precisely known. Thus,
an analytical design of invariant features does not seem feasible. Instead one can seek to
find features that may not be perfectly invariant anymore but which on average change as
slowly as possible under the transformations occurring in the data (Földiák 1991). The best
known instantiation of this approach is slow feature analysis (SFA) which has been proposed
to underlie the formation of complex cell receptive fields (Berkes Wiskott 2005). From a
machine learning perspective, SFA can be seen as a special case of oriented principal component
analysis that greedily searches for filters that maximize the signal-to-noise ratio if the
variations generated by the transformational changes are considered noise. For the learning of
complex cells the algorithm has been applied in the quadratic feature space. Here we present
a new algorithm called slow subspace analysis (SSA). SSA combines the slowness objective
of SFA with the energy model known from steerable filter theory such that it yields perfectly
invariant steerable filters in the ideal analytically tractable cases. There are two important
differences between SFA and SSA: First, while SSA uses the same slowness criterion as SFA
for each individual feature, it replaces the greedy search strategy by optimizing all filters
simultaneously for the best average slowness, and second, the optimization in SSA is done
only over the (n2 + n)/2 dimensional parameter space of orthogonal transforms on the original
n-dimensional signal space while for complex cell learning with SFA the optimization
is carried out over the entire quadratic feature space for which the number of parameters is
much larger, i.e. (n4+2n3−n2−2n)/8. These differences make SSA an interesting alternative
to SFA. In particular, the theoretical grounding of SSA in steerable filter theory is attractive
as it allows one to carry out meaningful model comparisons between different algorithms.
Accordingly, we show that our new algorithm exhibits larger slowness than SFA for various
important examples such as translations, rotations and scalings as well as natural movies.