ausblenden:
Schlagwörter:
-
Zusammenfassung:
We propose a highly efficient framework for penalized likelihood kernel methods applied
to multi-class models with a large, structured set of classes. As opposed to many previous
approaches which try to decompose the fitting problem into many smaller ones, we focus
on a Newton optimization of the complete model, making use of model structure and
linear conjugate gradients in order to approximate Newton search directions. Crucially,
our learning method is based entirely on matrix-vector multiplication primitives with the
kernel matrices and their derivatives, allowing straightforward specialization to new kernels,
and focusing code optimization efforts to these primitives only.
Kernel parameters are learned automatically, by maximizing the cross-validation log
likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate
our approach on large scale text classification tasks with hierarchical structure on
thousands of classes, achieving state-of-the-art results in an order of magnitude less time
than previous work.