Universal Grammar is not dead

In a new study to be published in next week’s issue of PNAS, Yuan Yang and Steven Piantadosi’s paper, “One model for the learning of language,” attempts to show that language acquisition is possible without recourse to “innate knowledge of the structures that occur in natural language.” The authors claim that a domain-general, rule-learning algorithm can “acquire key pieces of natural language.” The paper provides a number of simple and elegant arguments, but ones which may not be as revolutionary as the authors seem to have intended. 

One immediate qualification is needed. Generative linguists, being careful with their words, are wont to stress that what is hypothesized to be innate with respect to child language acquisition is not “the structures that occur in natural language,” as Yang and Piantadosi claim. Rather, it is simply what Otto Jespersen presciently termed “a notion of structure.” It is the capacity to build a structure that is thought to be innate, in addition to the arguably more mysterious content of what makes up individual words.

The authors provide a model that can take strings of discrete elements and execute a number of primitive operations. The “assumed primitive functions” make regular reference to linearity: things like “list,” “first character,” “middle of Y,” and “string comparison.” For instance, they discuss what they term “pair” and “first” operations, which they claim “are similar in spirit to ‘Merge’ in minimalist linguistics, except they come with none of the associated machinery that is required in those theories; here, they only concatenate.” 

There are a number of issues to unpack here. The operation Merge is typically not assumed to be a concatenation process; it simply forms sets and does not impose any order. Syntax has been known to require more than just Merge; it also needs a set-categorization or a “labeling” operation. Yang and Piantadosi assume some measure of progress in that their model is free from all the cumbersome “associated machinery” of generative models of Mergebut this should not be a cause for celebration since the model ends up capturing only relations between strings and not relations between structures. As such, it falls short of explaining “key pieces of natural language.”

With respect to its architecture, Yang and Piantadosi’s model invokes a version of recursion, one which remembers any stochastic choices made on a previous call with the same arguments. But consider a basic feature of natural language recursion inherent to Merge: It appears to be strictly Markovian.

We could point here to the principle of Resource Restriction, another example of “associated machinery” that appear essential to modelling natural language syntax with any degree of accuracy. Resource Restriction states that when Merge maps workspace n to workspace n+1, the number of computationally accessible elements (syntactic objects) can only increase by one. It transpires that this may account for a peculiar property of natural language recursion that separates it from other forms of recursion (e.g., propositional calculus, proof theory): Merge involves the recursive mapping of workspaces that removes previously manipulated objects.

Hence, Resource Restriction renders natural language derivations strictly Markovian: The present stage is independent of what was generated earlier, unlike standard recursion. Similar observations apply to the idea that when Merge targets objects in a workspace, non-targeted elements remain conserved and intact once the new workspace has been established, continuing the intuition of “no tampering.”

Although they provide a technical and innovative model for string-set computations, none of the above core aspects of syntax and the associated obstacles are approximated by Yang and Piantadosi’s model. The authors discuss the fact that children learn many intricate syntactic rules from impoverished data, and they respond to this by pointing to the clear power of their simple domain-general learning model (they note that “any learning system that works over a sufficiently rich space of computations will appear to know about strings that have not been observed”), yet what their model is learning is complex strings, not structures.

Nevertheless, the author’s model successfully learns many types of formal languages that seem relevant for a range of linguistic objects, and its technical sophistication will likely inspire a range of new research into the learnability of some of the natural language’s less complex constructions. However, Yang and Piantadosi’s model does very poorly with the English auxiliary system, which the authors say may be due to the “complexity” of this system. Quite so. Likewise, the author’s model has difficulty learning the simple finite grammar from Braine that mimics phrase structure rules. It has moderate success with a fragment of English involving center-embedding.

The paper is framed throughout as providing evidence against some innate Universal Grammar component. While they do not explicitly claim that Universal Grammar is beyond recovery, they do maintain that some core learnability arguments in favor of it are flawed. The authors appear well-versed in the relevant learning literature, but there is a separate issue of the relevance of theoretical linguistics: namely, getting the facts about linguistic phenomena correct.

A brief aside: There is a peculiar generalization in linguistics, which likely extends beyond into the rest of the natural sciences, but which seems unusually vivid for the study of human language. We can summarise this as follows:

A decreasing understanding of linguistic theory X scales with increasingly vocal objections to X.

Researchers who object most vocally to generative linguistics very often have limited understanding of the technical details. Sometimes this is not even denied. For instance, in his book critiquing the life and thought of Chomsky (Decoding Chomsky: Science and Revolutionary Politics), Chris Knight explains why Chomsky’s theory of language structure and evolution is incorrect—before briefly acknowledging a lack of any technical training in linguistics.  

While they have a competent grasp of the relevant literature, if one carefully considers the arguments from generative linguistics, there are no serious rebuttals on offer in Yang and Piantadosi’s paper. In addition, as the authors acknowledge, their model currently has no connections to the acquisition of compositional semantics—the strings are simply strings, not tied to any lexical content. Simply put, any learning model that does not link meaning with structure is not a model of human language. The intricate regulation of form/meaning pairs constitutes the stuff of syntactic theory, not simply the organization of strings into one or another final-stage arrangement that overlaps with the linearized output of a Merge-based computational system.

In addition, domain-general learning is entirely compatible with a restricted hypothesis space, as David Adger and others have regularly pointed out. Generative linguistics, especially since the arrival of the minimalist program, have actively been searching to reduce the size of Universal Grammar, but doing so in a way that accurately gets the linguistic facts right.

The authors do make an important speculation—that theories of language acquisition should prioritize measures of description length, which indeed seems like a promising direction, in particular given recent papers relating minimum description length measures to core aspects of semantic representations, such as quantifier representations. The authors do a convincing job in re-framing what it may be possible for a domain-general learner to know, and also the speed with which such knowledge could be acquired, but its applications may be more well-suited to non-linguistic learning domains, of which there are many.

While Piantadosi argues that “much of what Chomskyan linguists say about learnability is totally wrong,” it’s worth noting that much of it is certainly correct. Consider another recent PNAS paper by Laurel Perkins and Jeffrey Lidz, which discovered that non-local syntactic dependencies are represented by 18-month-old infants but not younger infants (i.e., the relation between a wh-element and a verb).

The difference between the Yang and Piantadosi PNAS paper and the Perkins and Lidz PNAS paper is that only the latter is about human language.

Yang and Piantadosi also follow a rich history. In the language acquisition literature, the Universal Grammar theorists and generative syntacticians were often branded the “bad guys” in the 1970s. The bad guys have often worked within more traditional methodologies that have been used, for instance, in AI. “Good old-fashioned AI” made use of formalisms within the tradition of Frege or Russell, such as mathematical logic and non-monotonic reasoning, amongst others. These approaches have been almost entirely replaced by what is now simply termed “AI”: probabilistic and statistical models. This was largely due to technical limitations that were reached and the urge to approach useful applications, which put fundamental questions to one side.

Science turned into engineering; theoretical understanding was replaced with practical applications. Yet Jesperson’s “notion of structure” lingers in the background, yet to be captured by models such as the one advanced by Yang and Piantadosi.

In terms of its recent advertisement by the authors, one framing seems clear: anti– vs. pro-Chomskyan linguistics. But what does it mean to be a “Chomskyan” linguist? Is Luigi Rizzi “anti-Chomskyan” or “pro-Chomskyan” when he develops a theory of movement halting that strongly deviates from Chomsky’s, even though the computational primitives are derived from early minimalist syntax? Is Noam Chomsky himself “anti-Chomskyan” when he recoils at some of his earlier theories?

These questions have no meaningful answers. To even introduce such a “pro-/anti-” framework serves no serious scientific nor expository purpose. It continues to be problematic for language acquisition modelers to exhibit clear, intentionally provocative negative assessments of theoretical linguistics. The field will continue to make no breakthrough advances so long as this remains the case.

At the same time, exciting work from Charles Yang, Stephen Crain, and others supports an interesting conception of Universal Grammar, highlighting the joint role that learning heuristics and structure-sensitivity seem to play in grammar acquisition, alongside concerns of computational efficiency, in line with ideas dating back decades within generative linguistics. In their review of the learning literature, Robert Berwick, Paul Pietroski, Beracah Yankama and Chomsky show how much of the statistical learning literature fails to get basic facts about structure-dependence of rules right.

Sadly for Yang and Piantadosi, the bad guys remain very much at large.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s