hide
Free keywords:
-
Abstract:
During social communication, vocal and facial cues combine to form a coherent audiovisual percept. While electrophysiology studies have described crossmodal interactions at various sensory processing stages, it remains unclear how audiovisual influences occur at the neuronal level in face- or voice-sensitive areas. Here, we characterize visual influences from facial content on neuronal responses to vocalizations from a voice-sensitive region in the anterior supratemporal plane (aSTP) and the anterior superior-temporal sulcus (STS). We hypothesized that the STS, a typical multisensory region, would show greater specificity in visual-auditory interactions, while the aSTP would be mainly involved in auditory analysis, such as distinguishing between voice-identity or call-type features.
Using dynamic face and voice stimuli, we recorded individual single neurons from both regions in the right hemisphere of two awake Rhesus macaques. To test the specificity of visual influences to behaviorally relevant stimuli, we included a set of audiovisual control stimuli, in which a voice was paired with a mismatched visual facial context.
Within the aSTP we found an interesting division of neural sensitivity to vocal features: the sensitivity to call-type or speaker-identity was supported by two functionally distinct neuronal subpopulations within this area. In contrast, neurons in the STS were less sensitive to these vocal features. Multisensory response modulation was observed in both regions, while evoked responses to visual stimuli were more prevalent in the STS. Moreover, visual influences in the STS were modulated by speaker-related features and were reduced during stimulation with incongruent voice-face pairs. In contrast, visual influences in the aSTP showed little specificity for audio-visual congruency.
Our results thus show that voice-sensitive cortex specializes in auditory analysis via a division of neuronal sensitivity while congruency-sensitive visual influences emerge to a greater extent in the STS. Together, our results highlight the transformation of audio-visual representations of communication signals across successive levels of the multisensory processing hierarchy in the primate temporal lobe.