hide
Free keywords:
-
Abstract:
Integration of multiple sensory cues pertaining to the same object is essential for precise and accurate perception. The optimal strategy to estimate an object’s property is to weight sensory cues proportional to their relative reliability (i.e., the inverse of the variance). Recent studies showed that human observers apply this strategy when integrating low-level unisensory and multisensory signals, but evidence for high-level perception remains scarce. Here we asked if human observers optimally integrate high-level visual cues in a socially critical task, namely the recognition of a face. We therefore had subjects identify one of two previously learned synthetic facial identities (“Laura” and “Susan”) using facial form and motion.
Five subjects performed a 2AFC identification task (i.e., “Laura or Susan?”) based on dynamic face stimuli that systematically varied in the amount of form and motion information they contained about each identity (10 morph steps from Laura to Susan). In single-cue conditions one cue (e.g., form) was varied while the other (e.g., motion) was kept uninformative (50 morph). In the combined-cue condition both cues varied by the same amount. To assess whether subjects weight facial form and motion proportional to their reliability, we also introduced cue-conflict conditions in which both cues were varied but separated by a small conflict (±10).
We fitted psychometric functions to the proportion of “Susan” choices pooled across subjects (fixed-effects analysis) for each condition. As predicted by optimal cue integration, the empirical combined variance was lower than the single-cue variances (p< 0.001, bootstrap test), and did not differ from the optimal combined variance (p>0.5). Moreover, no difference was found between empirical and optimal form and motion weights (p>0.5). Our data thus suggest that humans integrate high-level visual cues, such as facial form and motion, proportional to their reliability to yield a coherent percept of a facial identity.