Abstract
Children learn their mother tongue spontaneously and effortlessly
through communicative interaction with their environment; they do not
have to be taught explicitly or learn how to learn first. The ambient
language to which children are exposed, however, is highly variable
and arguably deficient with regard to the learning target.
Nonetheless, most normally developing children learn their native
language rapidly and with ease.
To explain this accomplishment, many theories of acquisition posit
innate constraints on learning, or even a biological endowment for
language which is specific to language. Usage-based theories, on the
other hand, place more emphasis on the role of experience and
domain-general learning mechanisms than on innate language-specific
knowledge. But languages are lexically open and combinatorial in
structure, so no amount of experience covers their
expressivity. Usage-based theories therefore have to explain how
children can generalize the properties of their linguistic input to an
adult-like grammar.
In this thesis I provide an explicit computational mechanism with
which usage-based theories of language can be tested and
evaluated. The focus of my work lies on complex syntax and the human
ability to form sentences which express more than one proposition by
means of relativization. This `capacity for recursion' is a hallmark
of an adult grammar and, as some have argued, the human language
faculty itself.
The manuscript is organized as follows. In the second chapter, I give
an overview of results that characterize the properties of neural
networks as mathematical objects and review previous attempts at
modelling the acquisition of complex syntax with such networks. The
chapter introduces the conceptual landscape in which the current work
is located.
In the third chapter, I argue that the construction and use of meaning
is essential in child language acquisition and adult
processing. Neural network models need to incorporate this dimension
of human linguistic behavior. I introduce the Dual-path model of
sentence production and syntactic development which is able to
represent semantics and learns from exposure to sentences paired with
their meaning (cf. Chang et al. 2006). I explain the architecture of
this model, motivate critical assumptions behind its design, and
discuss existing research using this model.
The fourth chapter describes and compares several extensions of the
basic architecture to accommodate the processing of multi-clause
utterances. These extensions are evaluated against computational
desiderata, such as good learning and generalization performance and
the parsimony of input representations. A single-best solution for
encoding the meaning of complex sentences with restrictive relative
clauses is identified, which forms the basis for all subsequent
simulations.
Chapter five analyzes the learning dynamics in more detail. I first
examine the model's behavior for different relative clause
types. Syntactic alternations prove to be particularly difficult to
learn because they complicate the meaning-to-form mapping the model
has to acquire. In the second part, I probe the internal
representations the model has developed during learning. It is argued
that the model acquires the argument structure of the construction
types in its input language and represents the hierarchical
organization of distinct multi-clause utterances.
The juice of this thesis is contained in chapters six to eight. In
chapter six, I test the Dual-path model's generalization capacities in
a variety of tasks. I show that its syntactic representations are
sufficiently transparent to allow structural generalization to novel
complex utterances. Semantic similarities between novel and familiar
sentence types play a critical role in this task. The Dual-path model
also has a capacity for generalizing familiar words to novel slots in
novel constructions (strong semantic systematicity). Moreover, I
identify learning conditions under which the model displays recursive
productivity. It is argued that the model's behavior is consistent
with human behavior in that production accuracy degrades with depth of
embedding, and right-branching is learned faster than center-embedding
recursion.
In chapter seven, I address the issue of learning complex polar
interrogatives in the absence of positive exemplars in the input. I
show that the Dual-path model can acquire the syntax of these
questions from simpler and similar structures which are warranted in a
child's linguistic environment. The model's errors closely match
children's errors, and it is suggested that children might not require
an innate learning bias to acquire auxiliary fronting. Since the model
does not implement a traditional kind of language-specific universal
grammar, these results are relevant to the poverty of the stimulus
debate.
English relative clause constructions give rise to similar performance
orderings in adult processing and child language acquisition. This
pattern matches the typological universal called the noun phrase
accessibility hierarchy. I propose an input-based explanation of this
data in chapter eight. The Dual-path model displays this ordering in
syntactic development when exposed to plausible input
distributions. But it is possible to manipulate and completely remove
the ordering by varying properties of the input from which the model
learns. This indicates, I argue, that patterns of interference and
facilitation among input structures can explain the hierarchy when all
structures are simultaneously learned and represented over a single
set of connection weights.
Finally, I draw conclusions from this work, address some unanswered
questions, and give a brief outlook on how this research might be
continued.