hide
Free keywords:
Computer Science, Graphics, cs.GR,Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Learning, cs.LG,eess.IV
Abstract:
We suggest representing light field (LF) videos as "one-off" neural networks
(NN), i.e., a learned mapping from view-plus-time coordinates to
high-resolution color values, trained on sparse views. Initially, this sounds
like a bad idea for three main reasons: First, a NN LF will likely have less
quality than a same-sized pixel basis representation. Second, only few training
data, e.g., 9 exemplars per frame are available for sparse LF videos. Third,
there is no generalization across LFs, but across view and time instead.
Consequently, a network needs to be trained for each LF video. Surprisingly,
these problems can turn into substantial advantages: Other than the linear
pixel basis, a NN has to come up with a compact, non-linear i.e., more
intelligent, explanation of color, conditioned on the sparse view and time
coordinates. As observed for many NN however, this representation now is
interpolatable: if the image output for sparse view coordinates is plausible,
it is for all intermediate, continuous coordinates as well. Our specific
network architecture involves a differentiable occlusion-aware warping step,
which leads to a compact set of trainable parameters and consequently fast
learning and fast execution.