Abstract
In this thesis we address three fundamental problems in computer vision using kernel methods. We first address the problem of object localization, which we frame as the problem of predicting a bounding box around an object of interest. We develop a framework in Chapter II for applying a branch and bound optimization strategy to efficiently and optimally detect a bounding box that maximizes objective functions including kernelized functions and proximity to a prototype. We demonstrate that this optimization can achieve state of the art results when applied to a simple linear objective function trained by a support vector machine. In Chapter III, we then examine how to train a kernelized objective function that is optimized for the task of object localization. In particular, this is achieved by the use of structured output regression. In contrast to a support vector machine, structured output regression does not simply predict binary outputs but rather predicts an element in some output space. In the case of object localization the output space is the space of all possible bounding boxes within an image. Structured output regression learns a function that measures the compatibility of inputs and outputs, and the best output is predicted by maximizing the compatibility over the space of outputs. This maximization turns out to be exactly the same branch and bound optimization as developed in Chapter II. Furthermore, a variant of this branch and bound optimization is also utilized during training as part of a constraint generation step. We then turn our focus to the problem of clustering images in Chapter IV. We first report results from a large scale evaluation of clustering algorithms, for which we measure how well the partition predicted by the clustering algorithm matches a known semantically correct partition of the data. In this study, we see particularly strong results from spectral clustering algorithms, which use the eigenvectors of an appropriately normalized kernel matrix to cluster the data. Motivated by this success, we develop a generalization of spectral clustering to data that appear in more than one modality, the primary example being images with associated text. As spectral clustering algorithms can be interpreted as the application of kernel principal components analysis followed by a reclustering step, we use the generalization of regularized kernel canonical correlation analysis followed by a reclustering step. The resulting algorithm, correlational spectral clustering, partitions the data significantly better than spectral clustering, and allows for the projection of unseen data that is only present in one modality (e.g. an image with no text caption). Finally, in Chapter V, we address the problem of discovering taxonomies in data. Given a sample of data, we wish to partition the data into clusters, and to find a taxonomy that relates the clusters. Our algorithm, numerical taxonomy clustering, works by maximizing a kernelized dependence measure between the data and an abstracted kernel matrix that is constructed from a partition matrix that defines the clusters and a positive definite matrix that represents the relationship between clusters. By appropriately constraining the latter matrix to be generated by an additive metric, we are able to interpret the result as a taxonomy. We make use of the well studied field of numerical taxonomy to efficiently optimize this constrained problem, and show that we not only achieve an interpretable result, but that the quality of clustering is improved for datasets that have a taxonomic structure.