Hierarchical inference and learning using distributed neural representations of uncertainty

Abstract: Sensory systems in the brain must make sense of noisy, often ambiguous incoming stimuli. Forming a percept based on receptor activations in the periphery is challenging – the underlying com- putation is ill-posed, and thus must be tackled probabilistically. The statistical accuracy of inference observed in behavioural experiments points to the capacity of neural circuits to learn about the genera- tive model underlying natural statistics. A number of schemes have been suggested for how populations of neurons may code for, and compute with, uncertainty (e.g. Hoyer & Hyvarinen 2003, Ma et al. 2006, Zemel et al. 1998), but there has been very little work on how such representations could be acquired by neural systems.
We propose a new approach, the Distributed Distributional Code (DDC) Helmholtz machine, to learn a causal generative model of sensory stimuli and simultaneously learn to accurately infer the corresponding explanatory (or latent) variables. A key feature of our model is that neural activity encodes uncertainty about the latent causes implicitly. The inferred posterior distribution is represented as a set of expectations distributed across a population of neurons (Zemel et al. 1998; Sahani & Dayan 2003), i.e. in a "distributed distributional code" (DDC). To learn both the generative and the recognition model, that performs inference over latent causes given the incoming sensory observations, we use a wake-sleep-like algorithm inspired by the Helmholtz machine (Dayan et al. 1995). Even for hierarchial models, the learning rules remain local, making our approach biologically appealing. Furthermore, the posterior representation does not impose independence or a rigid parametric structure, thus it is able to capture the statistical dependencies of the latent causes faithfully. We show that the DDC Helmholtz machine accurately learns generative models of olfactory and visual stimuli with hierarchically organized latent variables, where standard approaches relying on factorized posterior approximation fail.