Abstract
We propose a two-stage learning method which implements occluded visual scene analysis into a generative model, a type of hierarchical neural network with bi-directional synaptic connections. Here, top-down connections simulate forward optics to generate predictions for sensory driven low-level representation, whereas bottom-up connections function to send the prediction error, the difference between the sensory based and the predicted low-level representation, to higher areas. The prediction error is then used to update the high-level representation to obtain better agreement with the visual scene. Although the actual forward optics is highly nonlinear and the accuracy of simulated forward optics is crucial for these types of models, the majority of previous studies have only investigated linear and simplified cases of forward optics. Here we take occluded vision as an example of nonlinear forward optics, where an object in front completely masks out the object behind. We propose a two-staged learning method inspired by the staged development of infant visual capacity. In the primary learning stage, a minimal set of object basis is acquired within a linear generative model using the conventional unsupervised learning scheme. In the secondary learning stage, an auxiliary multi-layer neural network is trained to acquire nonlinear forward optics by supervised learning. The important point is that the high-level representation of the linear generative model serves as the input and the sensory driven low-level representation provides the desired output. Numerical simulations show that occluded visual scene analysis can indeed be implemented by the proposed method. Furthermore, considering the format of input to the multi-layer network and analysis of hidden-layer units leads to the prediction that whole object representation of partially occluded objects, together with complex intermediate representation as a consequence of nonlinear transformation from non-occluded to occluded representation may exist in the low-level visual system of the brain.