Universal Grammar is not dead

In a new study to be published in next week’s issue of PNAS, Yuan Yang and Steven Piantadosi’s paper, “One model for the learning of language,” attempts to show that language acquisition is possible without recourse to “innate knowledge of the structures that occur in natural language.” The authors claim that a domain-general, rule-learning algorithm can “acquire key pieces of natural language.” The paper provides a number of simple and elegant arguments, but ones which may not be as revolutionary as the authors seem to have intended. 

One immediate qualification is needed. Generative linguists, being careful with their words, are wont to stress that what is hypothesized to be innate with respect to child language acquisition is not “the structures that occur in natural language,” as Yang and Piantadosi claim. Rather, it is simply what Otto Jespersen presciently termed “a notion of structure.” It is the capacity to build a structure that is thought to be innate, in addition to the arguably more mysterious content of what makes up individual words.

The authors provide a model that can take strings of discrete elements and execute a number of primitive operations. The “assumed primitive functions” make regular reference to linearity: things like “list,” “first character,” “middle of Y,” and “string comparison.” For instance, they discuss what they term “pair” and “first” operations, which they claim “are similar in spirit to ‘Merge’ in minimalist linguistics, except they come with none of the associated machinery that is required in those theories; here, they only concatenate.” 

There are a number of issues to unpack here. The operation Merge is typically not assumed to be a concatenation process; it simply forms sets and does not impose any order. Syntax has been known to require more than just Merge; it also needs a set-categorization or a “labeling” operation. Yang and Piantadosi assume some measure of progress in that their model is free from all the cumbersome “associated machinery” of generative models of Mergebut this should not be a cause for celebration since the model ends up capturing only relations between strings and not relations between structures. As such, it falls short of explaining “key pieces of natural language.”

With respect to its architecture, Yang and Piantadosi’s model invokes a version of recursion, one which remembers any stochastic choices made on a previous call with the same arguments. But consider a basic feature of natural language recursion inherent to Merge: It appears to be strictly Markovian.

We could point here to the principle of Resource Restriction, another example of “associated machinery” that appear essential to modelling natural language syntax with any degree of accuracy. Resource Restriction states that when Merge maps workspace n to workspace n+1, the number of computationally accessible elements (syntactic objects) can only increase by one. It transpires that this may account for a peculiar property of natural language recursion that separates it from other forms of recursion (e.g., propositional calculus, proof theory): Merge involves the recursive mapping of workspaces that removes previously manipulated objects.

Hence, Resource Restriction renders natural language derivations strictly Markovian: The present stage is independent of what was generated earlier, unlike standard recursion. Similar observations apply to the idea that when Merge targets objects in a workspace, non-targeted elements remain conserved and intact once the new workspace has been established, continuing the intuition of “no tampering.”

Although they provide a technical and innovative model for string-set computations, none of the above core aspects of syntax and the associated obstacles are approximated by Yang and Piantadosi’s model. The authors discuss the fact that children learn many intricate syntactic rules from impoverished data, and they respond to this by pointing to the clear power of their simple domain-general learning model (they note that “any learning system that works over a sufficiently rich space of computations will appear to know about strings that have not been observed”), yet what their model is learning is complex strings, not structures.

Nevertheless, the author’s model successfully learns many types of formal languages that seem relevant for a range of linguistic objects, and its technical sophistication will likely inspire a range of new research into the learnability of some of the natural language’s less complex constructions. However, Yang and Piantadosi’s model does very poorly with the English auxiliary system, which the authors say may be due to the “complexity” of this system. Quite so. Likewise, the author’s model has difficulty learning the simple finite grammar from Braine that mimics phrase structure rules. It has moderate success with a fragment of English involving center-embedding.

The paper is framed throughout as providing evidence against some innate Universal Grammar component. While they do not explicitly claim that Universal Grammar is beyond recovery, they do maintain that some core learnability arguments in favor of it are flawed. The authors appear well-versed in the relevant learning literature, but there is a separate issue of the relevance of theoretical linguistics: namely, getting the facts about linguistic phenomena correct.

A brief aside: There is a peculiar generalization in linguistics, which likely extends beyond into the rest of the natural sciences, but which seems unusually vivid for the study of human language. We can summarise this as follows:

A decreasing understanding of linguistic theory X scales with increasingly vocal objections to X.

Researchers who object most vocally to generative linguistics very often have limited understanding of the technical details. Sometimes this is not even denied. For instance, in his book critiquing the life and thought of Chomsky (Decoding Chomsky: Science and Revolutionary Politics), Chris Knight explains why Chomsky’s theory of language structure and evolution is incorrect—before briefly acknowledging a lack of any technical training in linguistics.  

While they have a competent grasp of the relevant literature, if one carefully considers the arguments from generative linguistics, there are no serious rebuttals on offer in Yang and Piantadosi’s paper. In addition, as the authors acknowledge, their model currently has no connections to the acquisition of compositional semantics—the strings are simply strings, not tied to any lexical content. Simply put, any learning model that does not link meaning with structure is not a model of human language. The intricate regulation of form/meaning pairs constitutes the stuff of syntactic theory, not simply the organization of strings into one or another final-stage arrangement that overlaps with the linearized output of a Merge-based computational system.

In addition, domain-general learning is entirely compatible with a restricted hypothesis space, as David Adger and others have regularly pointed out. Generative linguistics, especially since the arrival of the minimalist program, have actively been searching to reduce the size of Universal Grammar, but doing so in a way that accurately gets the linguistic facts right.

The authors do make an important speculation—that theories of language acquisition should prioritize measures of description length, which indeed seems like a promising direction, in particular given recent papers relating minimum description length measures to core aspects of semantic representations, such as quantifier representations. The authors do a convincing job in re-framing what it may be possible for a domain-general learner to know, and also the speed with which such knowledge could be acquired, but its applications may be more well-suited to non-linguistic learning domains, of which there are many.

While Piantadosi argues that “much of what Chomskyan linguists say about learnability is totally wrong,” it’s worth noting that much of it is certainly correct. Consider another recent PNAS paper by Laurel Perkins and Jeffrey Lidz, which discovered that non-local syntactic dependencies are represented by 18-month-old infants but not younger infants (i.e., the relation between a wh-element and a verb).

The difference between the Yang and Piantadosi PNAS paper and the Perkins and Lidz PNAS paper is that only the latter is about human language.

Yang and Piantadosi also follow a rich history. In the language acquisition literature, the Universal Grammar theorists and generative syntacticians were often branded the “bad guys” in the 1970s. The bad guys have often worked within more traditional methodologies that have been used, for instance, in AI. “Good old-fashioned AI” made use of formalisms within the tradition of Frege or Russell, such as mathematical logic and non-monotonic reasoning, amongst others. These approaches have been almost entirely replaced by what is now simply termed “AI”: probabilistic and statistical models. This was largely due to technical limitations that were reached and the urge to approach useful applications, which put fundamental questions to one side.

Science turned into engineering; theoretical understanding was replaced with practical applications. Yet Jesperson’s “notion of structure” lingers in the background, yet to be captured by models such as the one advanced by Yang and Piantadosi.

In terms of its recent advertisement by the authors, one framing seems clear: anti– vs. pro-Chomskyan linguistics. But what does it mean to be a “Chomskyan” linguist? Is Luigi Rizzi “anti-Chomskyan” or “pro-Chomskyan” when he develops a theory of movement halting that strongly deviates from Chomsky’s, even though the computational primitives are derived from early minimalist syntax? Is Noam Chomsky himself “anti-Chomskyan” when he recoils at some of his earlier theories?

These questions have no meaningful answers. To even introduce such a “pro-/anti-” framework serves no serious scientific nor expository purpose. It continues to be problematic for language acquisition modelers to exhibit clear, intentionally provocative negative assessments of theoretical linguistics. The field will continue to make no breakthrough advances so long as this remains the case.

At the same time, exciting work from Charles Yang, Stephen Crain, and others supports an interesting conception of Universal Grammar, highlighting the joint role that learning heuristics and structure-sensitivity seem to play in grammar acquisition, alongside concerns of computational efficiency, in line with ideas dating back decades within generative linguistics. In their review of the learning literature, Robert Berwick, Paul Pietroski, Beracah Yankama and Chomsky show how much of the statistical learning literature fails to get basic facts about structure-dependence of rules right.

Sadly for Yang and Piantadosi, the bad guys remain very much at large.

Posted in Uncategorized | Leave a comment

University College Dublin School of Medicine Lecture

A recent talk I gave at UCD School of Medicine: “Why Everything You Know About Language is Wrong”.

Posted in Uncategorized | Leave a comment

Active Inference GuestStream Video

A recent video from the Active Inference GuestStream, presenting a pre-print with Emma Holmes and Karl Friston and discussing the nature of language and computation.

Posted in Uncategorized | Leave a comment

New paper out in Ampersand

Joint work with Evelina Leivada is now out in Ampersand. We discuss 10 ambiguous, misused or polysemous terms in linguistics, including I-/E-language, entrainment, reference, ‘the neural basis of X’, (un)grammaticality, third factor, and labeling.

Posted in Uncategorized | Leave a comment

The primitive brain of early Homo

Science today published a new study on early Homo braincase shape, with some commentary from me included. The full paper is here.

Posted in Uncategorized | Leave a comment

Abralin lecture on YouTube

Part of the Abralin series of linguistics lectures: “A Neurocomputational Perspective on Syntax“.

‘How are basic linguistic computations implemented in the brain? Drawing on recent findings from the biological and cognitive sciences, I will propose a neurocomputational model of language comprehension, with particular reference to syntactic and semantic processing. Reviewing the current state of the art, I will defend a multiplexing model of cross-frequency coupling in language comprehension, viewing this higher cognitive capacity as being grounded in endogenous neural oscillatory behaviour. Recent findings from theoretical syntax and semantics will be consulted in order to more carefully frame the implementation and development of this neurocomputational architecture. Alternative accounts in the literature will also be evaluated.’

For more information about Abralin ao Vivo – Linguists Online, visit here.

Posted in Uncategorized | Leave a comment

New regular column at Psychology Today

Starting this week, Psychology Today are publishing new articles of mine under the column “Language and Its Place in Nature“, so more regular writings can be found there. The first piece is about how aging impacts language processing.

Posted in Uncategorized | Leave a comment

New paper out in Linguistic Research

New theoretical syntax paper with Jae-Young shim on the status of categorial labeling and copies in Linguistic Research.


“In contrast to dominant views that the labeling algorithm (LA) detects (i) only the structurally highest copy of a moved object, or (ii) detects all copies, we propose and defend a third option: (iii) all copies are invisible to LA. The most immediate consequence of this is that objects formed by Internal Merge cannot serve as labels. We relate this proposal to a particular reinterpretation of LA theory such that LA constructs only categorial labels, barring the construction of and <φ, φ> configurations. We then propose an interface condition, Equal Embedding (EE), under which agreeing features must be equally as embedded in order for interpretation to be licensed. We argue that EE appears to fall out of minimal search requirements. We then propose a principled distinction between Agree and LA, based on their sensitivity to copies and interface relations: Both Agree and LA involve minimal search (Probe-Goal for Agree; categorial feature-detection for LA); however, copies are invisible to LA but not to Agree, and LA involves a CI relation (category-specific interpretation) whereas Agree involves an SM relation (the morpho-phonological process of feature-valuation”.

Posted in Uncategorized | Leave a comment