hide
Free keywords:
Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI
Abstract:
The strong lottery ticket hypothesis holds the promise that pruning randomly
initialized deep neural networks could offer a computationally efficient
alternative to deep learning with stochastic gradient descent. Common parameter
initialization schemes and existence proofs, however, are focused on networks
with zero biases, thus foregoing the potential universal approximation property
of pruning. To fill this gap, we extend multiple initialization schemes and
existence proofs to non-zero biases, including explicit 'looks-linear'
approaches for ReLU activation functions. These do not only enable truly
orthogonal parameter initialization but also reduce potential pruning errors.
In experiments on standard benchmark data sets, we further highlight the
practical benefits of non-zero bias initialization schemes, and present
theoretically inspired extensions for state-of-the-art strong lottery ticket
pruning.