Synchronous, but not entrained: exogenous and endogenous cortical rhythms of speech and language processing

Research on speech processing is often focused on a phenomenon termed “entrainment”, whereby the cortex shadows rhythmic acoustic informationwith oscillatory activity. Entrainment has been observed to a range of rhythms present in speech; in addition, synchronicity with abstract information (e.g. syntactic structures) has been observed. Entrainment accounts face two challenges: First, speech is not exactly rhythmic; second, synchronicity with representations that lack a clear acoustic counterpart has been described. We propose that apparent entrainment does not always result from acoustic information. Rather, internal rhythms may have functionalities in the generation of abstract representations and predictions. While acoustics may often provide punctate opportunities for entrainment, internal rhythms may also live a life of their own to infer and predict information, leading to intrinsic synchronicity – not to be counted as entrainment. This possibility may open up new research avenues in the psycho– and neurolinguistic study of language processing and language development. ARTICLE HISTORY Received 28 July 2019 Accepted 29 October 2019


The assumed role of entrainment in speech processing
The functional interpretation of cortical rhythms remains an issue of great theoretical importance in research on speech perception and language comprehension (Friederici & Singer, 2015;Giraud & Poeppel, 2012;Lewis & Bastiaansen, 2015;Meyer, 2017). Given the fact that language comprehension is essentially the inference of meaning from vibrations of air, comprehension must require the synthesis of prior knowledge in the form of endogenous information (e.g. a network's intrinsic activation state at stimulus onset) with sensory information in the form of exogenous acoustic input (Bever & Poeppel, 2010;Halle & Stevens, 1962;Martin, 2016;Poeppel & Monahan, 2011). Yet, in the past years, the auditory neuroscience and speech processing fields have focused mainly on the so-called "entrainment" of cortical rhythms during the processing of acoustics and pre-lexical representations such as phonemes, phonetic features, and syllables. In comparison, less focus has been put on potential cognitive and computational aspects of these signals for lexical, morphemic, syntactic, semantic, and discourse-and referential-level processing, all of which are crucial in understanding the meaning of speech, which is, of course, the goal of language comprehension. Entrainment describes the phase-locking of a neural oscillation, presumed to emanate from a population of neurons that fire in synchrony, to the phase of an external physical stimulus, such as speech (Giraud & Poeppel, 2012;Lakatos, Karmos, Mehta, Ulbert, & Schroeder, 2008;Obleser & Kayser, 2019;Pikovsky, Rosenblum, & Kurths, 2003). In the narrow sense, which we refer to as entrainment proper, a given rhythmic sequence of acoustic cues drives the cycles of a given neural oscillation into a phase-aligned rhythmic sequence. Entrainment proper is contingent on the acoustic stimulus and requires acoustic cues. Entrainment of neural oscillations to amplitude rhythms in speech is attested at the gamma Lehongre, Ramus, Villiermet, Schwartz, & Giraud, 2011), theta (Peelle, Gross, & Davis, 2013), and delta (Bourguignon et al., 2013) frequencies . It is thought that these electrophysiological rhythms can be entrained in the first place only because they match in frequency the rhythmic amplitude edges or peaks that accompany phonemes, syllables, and intonation phrases; the latter have a comparably weak rhythmicity. The general assumption that acoustic edges or peaks are consistently rhythmic and physically strong enough to entrain a neural oscillation is supported by rodent work, where high-amplitude neuronal discharges to complex acoustic stimuli were observed to reset the phase of local field potentials in the auditory cortex (Szymanski, Rabinowitz, Magri, Panzeri, & Schnupp, 2011). Challenge: synchronicity, but sparseness or absence of acoustic cues It has often been proposed that the entrainment of neural oscillations plays a mechanistic role in speech and language processing. Evidence for this role is mostly confined to the entrainment of theta-band oscillations to speech amplitude modulations at the syllabic rate, which is emphasised in recent neurophysiological and computational models (Ghitza, 2011;Giraud & Poeppel, 2012). These models conceptualise theta-band entrainment as a mechanism to segment continuous speech signals into syllable-size acoustic chunks, guiding follow-up auditory decoding on shorter time scales, such as phonemes. As each syllable contains a phonetic unit that is acoustically salient, it is likely that exogenous acoustic landmarks are dominant, if not crucial for the elicitation of synchronicity between electrophysiological rhythms and speech. This view is further corroborated by a close correlation between amplitude modulations at theta-band frequency in speech and average syllable duration in speech corpora (Greenberg, 2001;Pellegrino, Coupé, & Marsico, 2011); further support comes from psychophysical studies (Ghitza & Greenberg, 2009).
In direct contrast to such evidence for entrainment proper in the syllabic range, there are clear-cut cases where speech does simply not exhibit any physical cues that could possibly entrain neural oscillations. Instead, synchronicity occurs between specific frequency bands of the electroencephalogram and linguistic representations that, in principle, only exist in the mind and brain through perceptual inference (Marslen-Wilson & Welsh, 1978;Martin, 2016): There is no isomorphic relationship between a sound wave and a word, its meaning, and its syntactic category. Instead, the relationship is symbolic: A physical sound wave is arbitrarily associated with a meaning that must be decoded in context (Ding & He, 2016;Ding, Melloni, Zhang, Tian, & Poeppel, 2016;Meyer, Henry, Schmuck, Gaston, & Friederici, 2016). First, words in running speech do not have acoustic boundaries; so already the segmentation of running speech into individual words is a case of inference (e.g. Kösem, Basirat, Azizi, & van Wassenhove, 2016;Lany & Saffran, 2010;Martin, 2016). Second, word meaning cannot be implicitly derived from a sound wave (Dingemanse, Blasi, Lupyan, Christiansen, & Monaghan, 2015;Saussure, 1916); instead, the association between word meaning and sound identity is mostly arbitrary cross-linguistically. In addition, a single segmented acoustic word is often associated with several meanings in the mental lexicon (Aitchison, 2012) the selection of which depends on context (Nieuwland & Van Berkum, 2006). Third, the syntactic categories of words in many languages are not marked acousticallyyet, sentence meaning derives from syntactic categories: Comprehending who-did-what-to-whom requires the establishment of relationships amongst wordsthe formation of a syntactic structure based on syntactic categories. Yet, often more than a single syntactic structure is compatible with a given utterance, leading to multiple mutually exclusive interpretations of the same utterance (Meyer et al., 2016). The assignment of syntactic categories, the establishment of relationships amongst words, and the comprehension of who-did-what-to-whom rely on an inferential link between sensory input and grammatical knowledge of language.
Hence, entrainment proper is an unlikely mechanism for the formation of higher-level linguistic representations. How could synchronicity then be triggered by something that does not have clear physicality in the external world? The proposed dissociation between stimulus-dominant entrainment proper and intrinsic synchronicity with abstract symbolic linguistic information is in line with a series of results: For example, language comprehension is largely intact when prosodic cues, which occur within the modulation frequency range of the delta band, are removed from the speech signal  while removal of theta-range amplitude modulations results in a substantial loss of intelligibility (Doelling, Arnal, Ghitza, & Poeppel, 2014;Ghitza & Greenberg, 2009). In addition, phase-locking to linguistic structure occurs even when the speech stimulus contains distracting exogenous acoustic information (Meyer et al., 2016). It is even intensively debated and still requires future corpus analyses whether amplitude cues provided by pauses, duration information, and pitch modulations that occur in frequency ranges slower than theta are rhythmic enough to entrain oscillations in the first place; it is also unclear whether these cues are reliable enough to infer lexical or phrasal boundaries (Cummins, 2012;Fernald & McRoberts, 1996;Goswami & Leong, 2013;Jusczyk, 1997;Kelso, Saltzman, & Tuller, 1986;Martin, 2016).

Proposal: intrinsic synchronicity versus entrainment proper
We propose here that oscillatory synchronicity during speech processing and language comprehension is often not entrainment proper. Instead, the symbolic relationship between acoustic cues and the computation of abstract structures and linguistic predictions implies that oscillatory rhythmicity could also be intrinsically synchronous with the pace of ongoing inferencesand thus strictly cognitive processing (Marslen-Wilson & Welsh, 1978;Martin, 2016;Martin & Doumas, 2017, 2019. Morphemes, lexical representations, syntactic and semantic structures, discourse and event structures, as well as pragmatic inferences cannot be considered sensorydriven consequences of speech alone; instead, they are generated in predicted internally. Why should the computation of abstract linguistic structures be cyclic? There is evidence that such computation is bound by endogenous constraints on temporal regularity. In the domain of abstract syntactic processing, listeners are biased to group words into implicit phrases with a period that is highly regular across both time and participants (Fodor, 1998;Hwang & Steinhauer, 2011;Webman-Shafran & Fodor, 2016). This bias is strong enough to override exogenous prosodic cues that indicate phrase durations outside of the preferred phrasing period (Meyer, Elsner, Turker, Kuhnke, & Hartwigsen, 2018;Meyer et al., 2016). Furthermore, event-related brain potentials (ERPs) associated with the grouping of words into implicit phrases appear with a regular period that does not require the presence of periodic exogenous prosodic cues (Roll, Lindgren, Alter, & Horne, 2012;Schremm, Horne, & Roll, 2015;Steinhauer, Alter, & Friederici, 1999). Hidden in the frequency domain of such periodically occurring ERPs, there may be a slow-frequency oscillator that is synchronous with internally generated syntactic representations. This is consistent with the observation that the grouping of words into phrases associates with delta-band oscillatory activity, encompassing the range of periodicity of grouping-related ERPs (Bonhage, Meyer, Gruber, Friederici, & Mueller, 2017;Meyer et al., 2016;cf. Boucher, Gilbert, & Jemel, 2019). While grouping maximises information throughput beyond the capacity constraints of working memory (e.g. Baddeley, Hitch, & Allen, 2009;Bonhage et al., 2017), periodicity might ensure optimal use of electrophysiological constraints, such as the eigenfrequencies of cortical networks, and thus the time windows across which information can be integrated (e.g. Buzsaki, 2006Buzsaki, , 2019Keitel & Gross, 2016). The idea that endogenous oscillatory activity may be a reason for discretized information sampling has been discussed in detail elsewhere (e.g. Pöppel, 1997;VanRullen, 2016). Again, intrinsic synchronicity with internally generated word groups can have been "disguised" as entrainment in prior work, as it overlaps with the frequency band of apparent entrainment by speech prosody (i.e. <4 Hz; e.g. Bourguignon et al., 2013;Gross et al., 2013;Mai, Minett, & Wang, 2016). This hypothesis would be supported if intrinsic synchronicity with word groups and prosodic entrainment were dissociated through their different cortical substrates. In fact, deltaband activity has not only been reported for speech entrainment in the vicinity of auditory cortices, but also for higher-level processes in frontal cortices (e.g. Molinaro, Lizarazu, Lallier, Bourguignon, & Carreiras, 2016;Park, Ince, Schyns, Thut, & Gross, 2015).
In addition to the internal generation of abstract linguistic structures, why should abstract linguistic predictions be assumed to employ ongoing oscillatory electrophysiological activity? Abstract linguistic representations live a life of their own, such that the lexical-semantic or syntactic context accumulating over time allows for the continuous derivation and refinement of linguistic predictions (Friston, 2005;Levy, 2008;Lewis & Bastiaansen, 2015;Martin, 2016;Nieuwland & Van Berkum, 2006), although it remains unclear in which granularity these are propagated and to what degree these are necessary to interpret language (Nieuwland et al., 2018). Predictions are made within their respective domain; for instance, linguistic predictions derived from preceding word meanings allow for predictions of the meaning of individual upcoming words; linguistic predictions derived from preceding syntactic categories allow for predictions of the syntactic category of upcoming groups of words (Frank, Otten, Galli, & Vigliocco, 2015;Hale, Dyer, Kuncoro, & Brennan, 2018;Meyer & Gumbert, 2018). Predictions may also capitalise on the fact that syntactic and semantic or syntactic and discourse phenomena are often correlated with one another, having analog forms that must correspond on each level of representation and allow for iterative resampling (Martin, 2016). In either case, oscillatory power in the beta band was repeatedly found to be modulated by linguistic predictability (e.g. Lewis, Schoffelen, Hoffmann, Bastiaansen, & Schriefers, 2017;Wang et al., 2012) and sensory predictability in the auditory and audio-visual modality (Arnal & Giraud, 2012;Arnal, Wyart, & Giraud, 2011;Kim & Chung, 2008;Weiss & Mueller, 2012). In fact, the beta band has been proposed to subserve the internal generation of predictions across linguistic domains (Lewis & Bastiaansen, 2015;Lewis, Schoffelen, Schriefers, & Bastiaansen, 2016), in line with the role of the beta band proposed in the literature on predictive coding (e.g. Chao, Takaura, Wang, Fujii, & Dehaene, 2018;Engel & Fries, 2010;Roopun et al., 2008). In line with our suspicion of the potential confoundedness of intrinsic synchronicity and entrainment, prediction-related beta-band activity (i.e. 13-30 Hz) is immediately adjacent toor even overlaps withlower gamma-band activity, where entrainment at phoneme rate has been claimed to occur (e.g. 25-35 Hz, Lehongre et al., 2011;35-45 Hz, Gross et al., 2013;30-45 Hz, Di Liberto, O'Sullivan, & Lalor, 2015;30 Hz, Lizarazu et al., 2015). To dissociate phoneme-rate entrainment and prediction-related intrinsic synchronicity, one could hypothesise to observe phoneme-rate entrainment in auditory cortex, but prediction-related intrinsic synchronicity in surrounding association cortex.
As possible cases in principle of intrinsic synchronicity, entrainment by acoustic and temporal information that is not diagnostic of a unique linguistic unit can still result in the stable perception of a lexically ambiguous stimulus when task instructions are to detect either one or the other of the possible percepts (Kösem et al., 2016). A similar effect has been found during the processing of syntactically ambiguous sentences, where sentence interpretations that contradict acoustic cues are associated with a decreased entrainment by these cues, but a consistent phase shift towards the syntactic structure that is actually perceived (Meyer et al., 2016). While these types of effects strongly suggest that deployment of intrinsic signals in the form of cortical rhythms can shape stimulus comprehension, further studies are certainly required. A first setup to test our proposal could exploit the phenomenon that auditory processing performance transiently keeps stimulation frequency even after stimulation offset (Hickok, Farahbod, & Saberi, 2015;Neuling, Rach, Wagner, Wolters, & Herrmann, 2012). Translating this to the language domain, behavioural speech perception research has found that prior speech rate can affect the perception of downstream words when speech rate is subsequently increased or reduced, such that incoming phonemes that do not match the prior speech rate are misperceived (Bosker, 2017;Dilley & Pitt, 2010). It has recently been shown that such effects are accompanied by endogenous oscillatory phase shifts (Kösem et al., 2018). To rule out that continued rhythmicity is not simply a reverberation of prior entrainment, continued rhythmicity would have to be experimentally elicited without acoustic cues. For instance, one could think of experimental paradigms that require the rhythmic internal generation of abstract linguistic representations, which then could be shown to affect abstract linguistic processing after stimulation offset.
A second setup to test our proposal could exploit the fact that many sentences allow for the formation of multiple possible syntactic structures. One example are ambiguous relative clauses, such as in The doctor met the son of the colonel who died., where either the son or the colonel might have died, depending on the syntactic structure that the listener formed (Grillo, Costa, Fernandes, & Santi, 2015;Hemforth et al., 2015). A second example are prepositional phrases such as in The client sued the murderer with the corrupt lawyer, where the corrupt lawyer might either go with the client or with the murderer, depending on the syntactic structure that is generated (Meyer et al., 2016Wales & Toner, 1979). In psycholinguistics, various endogenous sources of bias in the assignment of syntactic structure to ambiguous sentences have been identified, including verbs' selection restrictions (Sedivy & Spivey-Knowlton, 1994), noun semantics (MacDonald, Pearlmutter, & Seidenberg, 1994), and working memory capacity limitations (Swets, Desmet, Hambrick, & Ferreira, 2007); all of these endogenous factors that are well understood in psycholinguistics should be experimentally tested for eliciting cases of apparent entrainment that are, however, cases of strictly intrinsic synchronicity.
A line of results that are problematic for the exogenous-dominant account of rhythmic neural oscillations in response to speech stimuli are compatible with our new proposal: First, rhythmicity of neural oscillations does still occur when the rhythmicity of external amplitude modulations is experimentally reduced (Calderone, Lakatos, Butler, & Castellanos, 2014;Mathewson et al., 2012; for review, see Ding & Simon, 2014;Zoefel & Van-Rullen, 2015), although we acknowledge that decreased temporal consistency of a stimulation rhythm has also been observed to reduce entrainment to some degree (Mathewson et al., 2012). Second, when rhythmic amplitude cues are experimentally kept identical between a vocoded and a non-vocoded speech condition, phaselocking is still increased for the non-vocoded condition (Peelle et al., 2013). Third and more generally, our proposal may help to address the recurring concern that amplitude modulations in speech are too arrhythmic to allow for exogenous entrainment by acoustic cues, oscillatory activity may still be rhythmic (Cummins, 2012;Goswami & Leong, 2013;Kelso et al., 1986;Mathewson et al., 2012): An intrinsically synchronous oscillator could exhibit rhythmic behaviour disguised as entrainment, which, however, would in fact underlie the generation of abstract linguistic representations or predictions.

How and why entrainment and intrinsic synchronicity may relate
We next turn to the question of the relationship between entrainment proper and intrinsic synchronicity. The availability of acoustic cues for entrainment proper changes over time. Likewise, the richness, detail, and possible forecasting of internally generated abstract representations, mirrored by intrinsic synchronicity, may change over time. Thus, during speech processing and language comprehension, the relationship between exogenous acoustic and endogenous abstract information in cortical networks may be highly dynamic over time (Arnal & Giraud, 2012;Herrmann, Munk, & Engel, 2004;Marslen-Wilson & Welsh, 1978;Martin, 2016;Martin & Doumas, 2017, 2019Seidl, 2007;Sherman, Kanai, Seth, & VanRullen, 2016). One could even hypothesise that behaviourally, speech perception and language comprehension might stay equally good under dynamically changing environmental conditions, due to dynamic fluctuations in the exogenous-endogenous weighting over time. When the speech signal is clear and environmental conditions are excellent, entrainment can dominate speech perception. In turn, any representation that has been generated internally allows for the generation of predictions across the various linguistic levels (Hale, 2001;Levy, 2008;Martin, 2016). This can result from the same inferential process that structure building would, without the invocation of an additional predictive mechanism (Marslen-Wilson & Welsh, 1978;Martin, 2016;Martin & Doumas, 2017, 2019. Endogenously generated linguistic information can thus keep neural oscillations in the auditory system in a rhythmic state, in effect stabilising temporal alignment of neuronal excitability with residual acoustic (e.g. spectral) cues in the speech stimulus (Arnal & Giraud, 2012;Arnal et al., 2015;Kayser et al., 2015;Lakatos et al., 2005;Park et al., 2015) even when exogenous acoustic amplitude cues at these faster frequencies are arrhythmic or temporarily lacking (see Figure 1). Figure 1. A: Entrainment of neural oscillations by a rhythmic exogenous acoustic stimulus (e.g. a regular tone sequence). Rhythmic edges or peaks in the amplitude envelope of the stimulus synchronise neural oscillation at stimulation frequency; B: Entrainment impossible in cases of non-rhythmic amplitude cues in speech stimulus; C: Intrinsic synchronicity of neural oscillations to a non-rhythmic acoustic stimulus (e.g. speech). Inferred and predictive linguistic knowledge (e.g. of abstract syntactic structure or predicted words) deployed by endogenous signals (i.e. local-field potentials or slower-frequency neural oscillations) establishes continued oscillatory rhythmicity disguised as entrainment, in spite of lacking acoustic cues.
In line with our proposal, the processing of clear speech, relative to the processing of acoustically degraded speech, is not only associated with reliable entrainment by sensory input, but also with an increase in the endogenous modulation of sensory entrainment (Herrmann et al., 2004;Park et al., 2015;Sherman et al., 2016) endogenous neural oscillations that provide inferred linguistic information may thus provide human language with a powerful degree of redundancy, potentially compensating for the transient non-rhythmicity of acoustic information throughout speech. An analogy for the interplay of endogenous abstract and exogenous acoustic signals would be riding a bicycle, where you might achieve the same overall speed with strong pedalling of the left leg, but weaker pedalling of the right leg, or vice versa. When exogenous acoustic information is not strong enough to entrain to, endogenous information may jump in. Such ongoing imbalance would ensure an optimal use of the total amount of available information, be it acoustic or internal.
Functional neuroanatomy: abstraction from local to global?
The proposed ongoing entrained-intrinsic imbalance requires an underlying functional neuroanatomy that can handle increasing degrees of cognitive abstraction, possible relying on cortical networks of increasing size and complexity. At present, we may only speculate that the functional neuroanatomy of the dynamic interplay between exogenous acoustic (a.k.a., incoming, perceived, predicted) and endogenous abstract (a.k.a., internally generated, inferred, predictive) depends on the degree of linguistic abstraction. On lower abstraction levels (e.g. phoneme level), local networks might achieve the exogenous-endogenous interplay; on higher, abstract levels (e.g. syntax and semantics), networks may increase. In general, it has been proposed that representational abstraction goes hand in hand with an increase in the size of the involved oscillatory network (Buzsaki, 2006(Buzsaki, , 2019Sarnthein, Petsche, Rappelsberger, Shaw, & von Stein, 1998;von Stein, Rappelsberger, Sarnthein, & Petsche, 1999). It has also been hypothesised that the associated increase in the variance in conduction delays results in an increase in the wavelength of those oscillations that underlie network-level processing (Buzsaki, 2006).
In support of this tentative proposal, gamma-band entrainment of circumscribed auditory regions has been observed both to phoneme-rate amplitude modulations (Di Liberto et al., 2015;Gross et al., 2013) and phonological categories (Lehongre et al., 2011;Nourski et al., 2015). Phonemic-categorical information as such is not present in the speech signal (i.e. exogenous), but can only be inferred with the help of abstract linguistic knowledge (i.e. endogenously). Yet, the involved networks do not extend beyond auditory association cortex (e.g. superior posterior temporal cortex; Mesgarani, Cheung, Johnson, & Chang, 2014): Electrocorticographic data suggest that auditory association cortex is active when phonemic-categorical representations are inferred once the associated acoustic information is artificially removed from speech (Leonard, Baud, Sjerps, & Chang, 2016).
Some evidence is also compatible with the proposed involvement of large-scale, slow-frequency networks in the exogenous-endogenous interplay during abstract linguistic processing: Synchronicity of delta-band oscillations in frontal cortices increases for speech as compared to both amplitude-modulated white noise and spectrally rotated speechthat is, when linguistic information can be inferred in the first place (Molinaro & Lizarazu, 2017). Likewise, effective connectivity from frontal to posterior cortices in the delta band increases for clear speech compared to backward speech (Park et al., 2015). In general, literature on early evoked responses suggests that the diagnosis of anomalies that violate abstract expectations derived from internal linguistic knowledge is not achieved by sensory cortices alone (Dikker, Rabagliati, & Pylkkänen, 2009;Herrmann, Maess, Hasting, & Friederici, 2009), but involves additional generators in the frontal cortices (Friederici, Wang, Herrmann, Maess, & Oertel, 2000). When testing this experimentally, it could be hypothesised that entrainment proper (i.e. driven by acoustic cues exclusively) is restricted to sensory cortices. Intrinsic synchronicity accompanying categorical abstraction (e.g. inference of phonemic features) should be observed in sensory association cortices. Increasingly abstract and generative processes (e.g. the generation of syntactic structure, linguistic predictions) would be hypothesised to associate with intrinsic synchronicity in larger frontalposterior networks.
Consequences for language acquisition: from entrained to intrinsic?
In addition to providing a novel explanation for oscillatory phenomena in speech processing and language comprehension, our proposal also allows for new ways to study language acquisitionas a case in principle for a progression from exogenously-driven entrainment proper to endogenously-driven intrinsic synchronicity. Two examples offer support for the idea that the accumulation of linguistic knowledge leads to the deployment of endogenous signals that give rise to intrinsic synchronicity.
Second, a compatible behavioural trajectory is known for abstract linguistic information: In the developing ability to parse words into syntactic phrases, infants start out with the ability to perceiveand later inferthese phrases' boundaries through exogenous acoustic amplitude cues (Isobe, 2007;Männel & Friederici, 2009;Wiedmann & Winkler, 2015); yet, after six years of age, amplitude cues are not necessary for syntactic phrasing anymore (Männel, Schipke, & Friederici, 2013) and perhaps become overridden by information that manifests as endogenous syntactic preferences in adultsin association with a decreased entrainment by amplitude cues that contradict these endogenous preferences (Meyer et al., 2016).

Conclusion
We have argued that segmenting speech into representations with structure and meaning calls on two forms of synchronised neural oscillations: Entrainment proper occurs in response to exogenous stimulus, such as acoustic speech envelopes, and likely to other acoustic features of speech. Intrinsic synchronicity may disguise as entrainment, yet stems from the generation of linguistic meaning based on the synthesis of the exogenous acoustic signal with (pre-)activated endogenous representations through perceptual inference.