Notes on Large Language Models and Linguistic Theory

After having a recent conversation with Steven Piantadosi on large language models, I wanted to briefly comment here on some of the themes in our discussion, before turning to additional critiques that we did not have enough time to talk about.

When we discussed impossible vs. possible language, Steven seemed to confuse languages with the language faculty. We can make claims about what we think impossible languages are based on theoretical architecture, and just because we haven’t documented the profile of every language in human history it does not follow that we cannot make sensible inferences about what the human language faculty is. At the same time, generative grammar has actually drawn attention to smaller language families and also endangered languages. Andrew Nevins at UCL has a new book called When Minoritized Languages Change Linguistic Theory, which showcases examples throughout the history of generative grammar from across syntax, morphology, semantics, phonology in which minoritized languages have disrupted assumptions in generative grammar and forced theory-modification.

With respect to what we were discussing about language and thought, I would just add briefly that within generative grammar many have maintained (as I did) that language provides a new format for thought. This is not to say that it exhausts what it means to think, it just means that it modifies the pre-existing primate conceptual apparatus in very specific ways.

AI Hype

Here is a very abridged list of what large language models have been claimed to be capable of over the last few months: they have theory of mind, they are a master of all trades, they can do domain-agnostic reasoning, egocentric memory, control, they have inner monologues, can do basic arithmetic, solve computer tasks, word class representations spontaneously emerge from them, they can perform statutory reasoning, they are versatile decomposers, they can do causal reasoning over entities and events, they can think like a lawyer, they can self-improve, they can execute moral judgments, they can self-correct, they can rate the meaning of various sounds, they can detect sarcasm, they can do logical reasoning, they can be conscious, they spontaneously develop autonomous scientific research capabilities, they can do analogical reasoning, and they can generate ancient Chinese poetry. Perhaps it’s fair to say that the AI hype has gone a bit too far? Some at least seem prone to this. Walid Saba was on the Machine Learning Street Talk podcast recently to explain why he changed his mind on LLMs, and now believes they have mastered natural language syntax. He said: “I’m a scientist; I see a big result, I say wow”. But that’s not what being a scientist is – being a scientist is seeing a big result and saying ‘hang on a second’. Too much hype, not enough reflection.

Divergences

The goals of science are based around concerns of parsimony – but the goals of the machine learning enterprise are based around megamony.

Language model states typically carry information both about the ‘world’ and about language; information of either kind is typically useful for various tasks, so we don’t know at any given moment what information language models use and the content of their representations, even if we know what task they perform. This is a kind of indeterminacy problem. How can we construct a theory of language from this?

Implementation

In his paper, Steven cites Edelman as saying that evidence from neuroscience for things like traces/copies and movement remains elusive. But theoretical syntacticians are not trying to model traces and indices in order to actually find them in the brain somewhere – that’s the goal of psycholinguistics, process models at the algorithmic level, informed by these abstract computational models in various intricate ways. Likewise, FocusPhrase or ForcePhrase are not expected to yield meaningfully unique BOLD responses or ECoG responses.

Theories of Language

The goal of science is theory-building. Empirical evidence is used to support, test and improve theories. Olivia Guest’s new work on theory formation proposes a metatheoretical calculus, some way of choosing between competing theories, and one of the criteria is simply metaphysical commitments, i.e. which aspects of the theory are just assumed, and are not under active interrogation and investigation. So a theory of visual attention will probably not ever entertain the idea that visual attention does not exist, but it might assume some mechanisms and phenomena. In generative grammar, we have some specific metaphysical commitments about the architecture and format of language, but it’s much less clear what a theory of language derived from modern language models can offer here – what are its metaphysical commitments?

Claiming that ‘Modern language models refute Chomsky’s approach to language’ is a category error. Theories are different from frameworks and programs, because many theories can be within the same program. Do MLMs refute Chomsky (2013), or Chomsky (2022), etc? Conversely, we would never say that ‘Chomsky’s approach to language refutes modern language models’. One is a research program, the other is an engineering tool. LLMs do not prove anything about what humans do, so it’s odd to state that they refute a whole enterprise in cognitive science.

Until the basic properties of syntax are captured by MLMs – or even the semantic properties of basic adjectives, which are also currently out of reach according to recent work from Liu et al. (2023) – it’s premature to say that they refute Chomsky’s approach to language. Infants learn syntax with semantics, and this semantics updates generative models of the world. If we want to open up the black box of these models some have argued for using probing or looking at tensor products or tree projections etc., and these methods might get you somewhere but I don’t see how these mechanisms can replace linguistic theory.

I also don’t see how higher attention scores in transformer models at the point of multi-head attention replace or inform conceptual role theory. Likewise, can neural networks perform symbolic manipulation? If so, can locality-sensitive hashing tables perform symbolic manipulation (and why not)? All the enactive part for ChatGPT is done by humans. The generative AI is not acting, and it’s generating content not beliefs. It works in data space, its purpose is not to ‘understand’.

Principles of Language

There are no concrete models of language that we get from MLMs, there are no clear principles. Linguistic theory is really unique in this respect. Theories of vision get closer here with respect to principles of computation. For example, phase theory in generative syntax, or something like the classical Freezing Principle, assume, basically, that material inside a certain constructed phrase is inaccessible to further manipulation when some kind of raising occurs, such that no more material can be extracted out of it.

               a. I think that John never reads [reviews of his books]

               b. Whose booksi do you think that John never reads [reviews of ti]?

               c. I think that [reviews of his books]i John never reads ti

               d. *Whose booksi do you think that [reviews of ti]j John never reads tj ?

In some generative circles, this has recently been given a more ‘cognitive’ rather than formal treatment, for example a processing-related explanation. Another explanation for the Freezing Principle says that there are some prosodic reasons for it. But the point to focus on here is that the tools of linguistic theory allow us to negotiate the locus of these kinds of effects. How do MLMs improve this or provide novel insights here?

Linguistic theory offers a clear principle of language here. There’s nothing else like this in cognitive science. David Marr was asked shortly before he died if there was anything like this principle in vision, whereby ‘when a structure undergoes a non-structure-preserving transformation the interior of the structure can’t be analyzed further’. Marr said that you couldn’t tell because all transformations in vision were linear according to him, i.e. they were structure-preserving.

Models and Architectures

A lot of recent research working with transformer models will sometimes say things like ‘this particular point of the architecture (i.e., everything from tokens, to embedding, to positional embedding, to multi-head attention, to modified vectors) looks a lot like binding’ or ‘looks like filler-role independence’ or ‘looks like Merge’. But a lot of things can look like binding, and a lot of stuff looks like reading tea leaves too, so how can we draw up a more principled connection to the stuff of language without falling prey to redescription of linguistic theory rather than re-explanation? Raphael Milliere, whose work beautifully negotiates between linguistic theory and machine learning, thinks that transformer models can implement a kind of non-classical constituent structure (i.e. something that isn’t straight up concatenation), and thinks that they also have a kind of ‘fuzzy’ variable binding which is not strict algebraic variable binding where you have graded, probabilistic bindings of fillers to roles. Raphael also thinks we have ‘shades’ of role-filler independence via overlapping subspaces during the multi-head attention phase. At least for now, it seems that these artificial systems might be doing structured representations but so far it’s all sub-symbolic stuff.

Some recent work by Shikhar Murty and Christopher Manning and colleagues has argued that transformers might be able to learn to become tree-like when trained on language data. They looked at some sequence transduction tasks. But even Murty and colleagues conclude after showing possible tree-like computations that “our results suggest that making further progress on human-like compositional generalization might require inductive biases that encourage the emergence of latent tree-like structure.” A lot of similar claims about tree-structure computations in transformers and language models are based on ‘extrinsic’ assessments of performance rather than intrinsic assessments that directly estimate how well a parametric tree-structured computation approximates the model’s computation.

Other recent work that takes the insights of syntactic theory seriously shows that linguistic theory helps with model scalability, rather than hindering it. Sartran and colleagues add inductive biases (syntactic priors) to their transformers, and show massive improvement.

Language, Thought and Communication

Anna Ivanova said a few weeks ago, at a talk she gave at NYU during a symposium on deep learning. that there’s a fallacy whereby some people assume that if a system is bad at thought, then it must be bad at language. She gave Chomsky as an example of that, quoting him saying ‘what do LLMs tells us about language? Zero’. But there is a problem here – Chomsky wasn’t marshalling evidence for LLMs being ‘bad at thought’ to argue that LLMs don’t tell us anything about language, in fact I don’t think he ever gave any examples of ‘bad thought’; his reason for why LLMs tell us nothing about language was a more architectural point about their ability to learn impossible languages.

In other recent work, Ivanova argues that the language network needs to interact with other brain networks in different regions, like social cognition in lateral parietal regions, situation model construction in medial parietal regions, world knowledge in highly distributed cortices, general cognitive tasks in middle frontal cortex and superior parietal cortex, and semantic processes in various frontotemporal sites. But all of this is highly compatible with the minimalist architecture of a core language system interfacing with extra-linguistic systems. It’s also compatible with certain interpretations of the concept of autonomy of syntax.

Chomsky has never claimed that combinatorial rules are blind to the content/meaning of elements; we have selectional requirements, we have intricate relations between feature-checking operations in minimalist syntax that directly determine what kind of Merge operation you can execute, and when you can execute it. All autonomy of syntax means is that there are syntactic mechanisms that aren’t semantic, not that semantics is irrelevant. And sure enough, there are some syntactic mechanisms that are not semantic.

Ivanova, Piatandosi and many others commonly cite work reporting aphasic patients showing no deficits in complex reasoning, and use this to undermine generative grammar. But we expect this under a non-lexicalist framework of generative syntax: meaning, syntax and ‘form’ are all separate systems, with separate representations in long-term memory, and ‘the lexicon’ is not a thing but a process of combining these three feature types together. Other, more complex objects can, of course, be stored in lexical memory for efficient retrieval, depending on the language, and indeed the person. So impairments in syntactic features and syntactic structure does not lead to the prediction that conceptual features will be impaired.

Cognitive Plausibility

GPT-4 and its predecessors don’t have long term memory – they don’t build a sense of understanding or self. However, LLMs have a working memory of thousands of items, but much research in generative syntax these days concerns the interfaces of mapping structure to distinct workspaces and the order in which certain portions of structure are interpreted, and the important role that memory constraints have on this. Even Christiansen and Chater, no supporters of generative grammar, suggest that humans have a now-or-never bottleneck whereby human memory limitations impose radical constraints on grammaticality, and grammaticalization. These memory constraints and considerations are not part of the discourse on MLMs.

So it may be, then, that language models need to be actively impaired and disrupted in some way to more accurately model human performance, as Andrew Lampinen at DeepMind has suggested.

LLMs will very likely form part of some ultimate AGI system, if AGI can even be achieved. LLMs don’t seem likely to radically innovate beyond their training data, nor do they seem capable of carrying out long and subtle chains of reasoning. But they can inform and interface with other artificial systems that could do these things (e.g., the Wolfram Alpha plugin for ChatGPT) – building a modular architecture not unlike the generative framework for the human mind.

Robert Long recently gave a talk in which he listed all the ways in which cognitive science has contributed to AI progress with LLMs, and the list was empty. Which is maybe debatable, but even so that kind of result actually just reinforces the kind of disconnect that people like Chomsky have been trying to highlight. These are just two separate fields of study. Even stuff like attention in transformer models has nothing to do with human-like attention, it’s a post-hoc metaphor. And convolutions come from very behaviorist black box kinds of frameworks – everywhere you look you see divergence between these fields.

The Syntax of Screenshots

One example Steven uses in his paper, and some other places, is the Carnie syntax textbook example, from 2002. In his textbook (note: not a monograph or polemical piece), Carnie says: Premise 1, syntax is a productive, recursive and infinite system; Premise 2, rule-governed infinite systems are unacquirable; Conclusion: therefore syntax is an unacquirable system. Since we have such a system, it follows that at least parts of syntax are innate.

Steven screenshots this section of text. But then in the immediately following paragraph, Carnie says: ‘There are parts of this argument that are very controversial. In the challenge problem sets at the end of this chapter you are invited to think very critically about the form of this proof. Problem set X considers the possibility that premise 1 is false (but hopefully you will conclude that despite the argument given in the problem set the idea that language is productive and infinite is correct). Premise 2 is more dubious, and is the topic of problem set Y. You are invited to be skeptical and critical of these premises when you do the problem set.’

Wolfram Beta

Steven Wolfram’s rule 30 showed beautifully that a simple rule can lead to computational complexity. Something similar arrives with Chomsky’s minimalist program – you have two operations, internal merge and external merge (unified in various ways in recent work), and from its interfaces with different systems and their own domain-specific conditions on interpretation and externalization you get the computational complexity of human language.

One might think that there is a potential sympathy developing here. However, Steven Wolfram said in an essay he wrote in February about ChatGPT that “my strong suspicion is that the success of ChatGPT implicitly reveals an important “scientific” fact: that there’s actually a lot more structure and simplicity to meaningful human language than we ever knew—and that in the end there may be even fairly simple rules that describe how such language can be put together.” Wolfram’s argument here is very familiar to linguists.

Wolfram says: “ChatGPT provides perhaps the best impetus we’ve had in two thousand years to understand better just what the fundamental character and principles might be of that central feature of the human condition that is human language and the processes of thinking behind it.” But the fundamental character and principles of human language are not obscure. Wolfram skips over the entire tradition of modern post-war linguistics, what’s been discovered since the 1950s.

Still, even Wolfram says that we will never actually be able to figure out what ChatGPT is doing except through tracing each step, it might be computationally irreducible – “it’s not clear that there’s a way to summarize what it’s doing’ in terms of a clear narrative description”.

Wolfram only gives one example of a rule that could be learned by ChatGPT, and that concerns basic logic: “Thus, for example, it’s reasonable to say “All X are Y. This is not Y, so it’s not an X”. And just as one can somewhat whimsically imagine that Aristotle discovered syllogistic logic by going through lots of examples of rhetoric, so too one can imagine that in the training of ChatGPT it will have been able to “discover syllogistic logic” by looking at lots of text on the web, etc.”

Yet, the difference here is that Aristotle explicitly discovered syllogistic logic and rhetorically described some of its apparent properties, but even the slave boys in Greece had the competence for syllogistic reasoning; they just didn’t explicitly formalize it or give it a name. It’s very different for ChatGPT – ChatGPT didn’t know syllogistic reasoning ab initio, and it may just look like it’s executing some basic logical operations, but a few difficult probe questions later and it looks like it can’t do it after all.

Chomsky Hierarchy

A recent paper called ‘Neural networks and the Chomsky hierarchy’ from Deletang and colleagues shows that “grouping tasks for NNs according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of distribution inputs. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.”

Deletang and colleagues discuss how RNNs are not Turing-complete, they lie lower on the Chomsky hierarchy. And previous work has shown that RNNs and LSTMs are capable of learning simple context-sensitive languages, but in a very limited way, i.e., they generalize only to lengths close to those seen during training (Bodén & Wiles, 2000, 2002; Gers & Schmidhuber, 2001). Transformers are capable of learning complex and highly structured generalization patterns, but they cannot overcome the limitation of not having an extendable memory. This might imply hard limits for scaling laws (Kaplan et al., 2020), because even significantly increasing the amount of training data and the size of a Transformer are insufficient for it to climb the Chomsky hierarchy.

But at the same time, the evolution of the Chomsky hierarchy is totally irrelevant to Merge-based systems. Merge-based systems operate over structures, not strings. The Chomsky hierarchy adapts to language Post’s general theory of computability, based on “rewriting systems”: rules that replace linear strings of symbols with new linear strings. All of the “formal languages” generated at the various levels of this hierarchy involve linear order. But Merge-based systems have (hierarchical) structure but no linear order, an essential property of binary sets formed by Merge. Merge-based systems do not even appear in the Chomsky hierarchy, and anything concluded from the study of the Chomsky hierarchy is irrelevant to the evolution of Merge-based systems.

Conclusion

Whatever the technological innovations of ChatGPT are, it doesn’t seem to be telling us about human language. This seems self-evident for at least two reasons: (1) no child is exposed to the data that ChatGPT is trained on; (2) there are impossible grammatical rules, e.g. mirror image rules, which ChatGPT could easily acquire but humans wouldn’t.

Chomsky’s main point is that distributional statistics alone will not capture language – and that remains to be refuted. The difference between humans and non-human primates is not simply a matter of scale – there must be some kind of fundamental algorithmic difference to get you human-like compositionality. And that’s the essence of Chomsky’s work, and the generative enterprise.

Even so, it is not impossible that the traditional methods of linguistics may have been exhausted for now, and insights may emerge from other areas, such as research into the neural representation of language.

An ancient alchemical dictum, In Sterquilinis Invenitur, translates into “in filth it will be found”. One reading of this is “what you are searching for the most will be found in the place you least want to look”. This forms the thematic basis of a number of ancient myths, but it may also be the case that both sides of this current debate Steven and I are engaged in – modern language model research, and traditional linguistic theory – could mutually benefit from searching in places we least want to look, for answers that are, in the end, continuing to elude both sides.

Posted in Uncategorized | Leave a comment

Universal Grammar is not dead

In a new study to be published in next week’s issue of PNAS, Yuan Yang and Steven Piantadosi’s paper, “One model for the learning of language,” attempts to show that language acquisition is possible without recourse to “innate knowledge of the structures that occur in natural language.” The authors claim that a domain-general, rule-learning algorithm can “acquire key pieces of natural language.” The paper provides a number of simple and elegant arguments, but ones which may not be as revolutionary as the authors seem to have intended. 

One immediate qualification is needed. Generative linguists, being careful with their words, are wont to stress that what is hypothesized to be innate with respect to child language acquisition is not “the structures that occur in natural language,” as Yang and Piantadosi claim. Rather, it is simply what Otto Jespersen presciently termed “a notion of structure.” It is the capacity to build a structure that is thought to be innate, in addition to the arguably more mysterious content of what makes up individual words.

The authors provide a model that can take strings of discrete elements and execute a number of primitive operations. The “assumed primitive functions” make regular reference to linearity: things like “list,” “first character,” “middle of Y,” and “string comparison.” For instance, they discuss what they term “pair” and “first” operations, which they claim “are similar in spirit to ‘Merge’ in minimalist linguistics, except they come with none of the associated machinery that is required in those theories; here, they only concatenate.” 

There are a number of issues to unpack here. The operation Merge is typically not assumed to be a concatenation process; it simply forms sets and does not impose any order. Syntax has been known to require more than just Merge; it also needs a set-categorization or a “labeling” operation. Yang and Piantadosi assume some measure of progress in that their model is free from all the cumbersome “associated machinery” of generative models of Mergebut this should not be a cause for celebration since the model ends up capturing only relations between strings and not relations between structures. As such, it falls short of explaining “key pieces of natural language.”

With respect to its architecture, Yang and Piantadosi’s model invokes a version of recursion, one which remembers any stochastic choices made on a previous call with the same arguments. But consider a basic feature of natural language recursion inherent to Merge: It appears to be strictly Markovian.

We could point here to the principle of Resource Restriction, another example of “associated machinery” that appear essential to modelling natural language syntax with any degree of accuracy. Resource Restriction states that when Merge maps workspace n to workspace n+1, the number of computationally accessible elements (syntactic objects) can only increase by one. It transpires that this may account for a peculiar property of natural language recursion that separates it from other forms of recursion (e.g., propositional calculus, proof theory): Merge involves the recursive mapping of workspaces that removes previously manipulated objects.

Hence, Resource Restriction renders natural language derivations strictly Markovian: The present stage is independent of what was generated earlier, unlike standard recursion. Similar observations apply to the idea that when Merge targets objects in a workspace, non-targeted elements remain conserved and intact once the new workspace has been established, continuing the intuition of “no tampering.”

Although they provide a technical and innovative model for string-set computations, none of the above core aspects of syntax and the associated obstacles are approximated by Yang and Piantadosi’s model. The authors discuss the fact that children learn many intricate syntactic rules from impoverished data, and they respond to this by pointing to the clear power of their simple domain-general learning model (they note that “any learning system that works over a sufficiently rich space of computations will appear to know about strings that have not been observed”), yet what their model is learning is complex strings, not structures.

Nevertheless, the author’s model successfully learns many types of formal languages that seem relevant for a range of linguistic objects, and its technical sophistication will likely inspire a range of new research into the learnability of some of the natural language’s less complex constructions. However, Yang and Piantadosi’s model does very poorly with the English auxiliary system, which the authors say may be due to the “complexity” of this system. Quite so. Likewise, the author’s model has difficulty learning the simple finite grammar from Braine that mimics phrase structure rules. It has moderate success with a fragment of English involving center-embedding.

The paper is framed throughout as providing evidence against some innate Universal Grammar component. While they do not explicitly claim that Universal Grammar is beyond recovery, they do maintain that some core learnability arguments in favor of it are flawed. The authors appear well-versed in the relevant learning literature, but there is a separate issue of the relevance of theoretical linguistics: namely, getting the facts about linguistic phenomena correct.

A brief aside: There is a peculiar generalization in linguistics, which likely extends beyond into the rest of the natural sciences, but which seems unusually vivid for the study of human language. We can summarise this as follows:

A decreasing understanding of linguistic theory X scales with increasingly vocal objections to X.

Researchers who object most vocally to generative linguistics very often have limited understanding of the technical details. Sometimes this is not even denied. For instance, in his book critiquing the life and thought of Chomsky (Decoding Chomsky: Science and Revolutionary Politics), Chris Knight explains why Chomsky’s theory of language structure and evolution is incorrect—before briefly acknowledging a lack of any technical training in linguistics.  

While they have a competent grasp of the relevant literature, if one carefully considers the arguments from generative linguistics, there are no serious rebuttals on offer in Yang and Piantadosi’s paper. In addition, as the authors acknowledge, their model currently has no connections to the acquisition of compositional semantics—the strings are simply strings, not tied to any lexical content. Simply put, any learning model that does not link meaning with structure is not a model of human language. The intricate regulation of form/meaning pairs constitutes the stuff of syntactic theory, not simply the organization of strings into one or another final-stage arrangement that overlaps with the linearized output of a Merge-based computational system.

In addition, domain-general learning is entirely compatible with a restricted hypothesis space, as David Adger and others have regularly pointed out. Generative linguistics, especially since the arrival of the minimalist program, have actively been searching to reduce the size of Universal Grammar, but doing so in a way that accurately gets the linguistic facts right.

The authors do make an important speculation—that theories of language acquisition should prioritize measures of description length, which indeed seems like a promising direction, in particular given recent papers relating minimum description length measures to core aspects of semantic representations, such as quantifier representations. The authors do a convincing job in re-framing what it may be possible for a domain-general learner to know, and also the speed with which such knowledge could be acquired, but its applications may be more well-suited to non-linguistic learning domains, of which there are many.

While Piantadosi argues that “much of what Chomskyan linguists say about learnability is totally wrong,” it’s worth noting that much of it is certainly correct. Consider another recent PNAS paper by Laurel Perkins and Jeffrey Lidz, which discovered that non-local syntactic dependencies are represented by 18-month-old infants but not younger infants (i.e., the relation between a wh-element and a verb).

The difference between the Yang and Piantadosi PNAS paper and the Perkins and Lidz PNAS paper is that only the latter is about human language.

Yang and Piantadosi also follow a rich history. In the language acquisition literature, the Universal Grammar theorists and generative syntacticians were often branded the “bad guys” in the 1970s. The bad guys have often worked within more traditional methodologies that have been used, for instance, in AI. “Good old-fashioned AI” made use of formalisms within the tradition of Frege or Russell, such as mathematical logic and non-monotonic reasoning, amongst others. These approaches have been almost entirely replaced by what is now simply termed “AI”: probabilistic and statistical models. This was largely due to technical limitations that were reached and the urge to approach useful applications, which put fundamental questions to one side.

Science turned into engineering; theoretical understanding was replaced with practical applications. Yet Jesperson’s “notion of structure” lingers in the background, yet to be captured by models such as the one advanced by Yang and Piantadosi.

In terms of its recent advertisement by the authors, one framing seems clear: anti– vs. pro-Chomskyan linguistics. But what does it mean to be a “Chomskyan” linguist? Is Luigi Rizzi “anti-Chomskyan” or “pro-Chomskyan” when he develops a theory of movement halting that strongly deviates from Chomsky’s, even though the computational primitives are derived from early minimalist syntax? Is Noam Chomsky himself “anti-Chomskyan” when he recoils at some of his earlier theories?

These questions have no meaningful answers. To even introduce such a “pro-/anti-” framework serves no serious scientific nor expository purpose. It continues to be problematic for language acquisition modelers to exhibit clear, intentionally provocative negative assessments of theoretical linguistics. The field will continue to make no breakthrough advances so long as this remains the case.

At the same time, exciting work from Charles Yang, Stephen Crain, and others supports an interesting conception of Universal Grammar, highlighting the joint role that learning heuristics and structure-sensitivity seem to play in grammar acquisition, alongside concerns of computational efficiency, in line with ideas dating back decades within generative linguistics. In their review of the learning literature, Robert Berwick, Paul Pietroski, Beracah Yankama and Chomsky show how much of the statistical learning literature fails to get basic facts about structure-dependence of rules right.

Sadly for Yang and Piantadosi, the bad guys remain very much at large.

Posted in Uncategorized | Leave a comment

University College Dublin School of Medicine Lecture

A recent talk I gave at UCD School of Medicine: “Why Everything You Know About Language is Wrong”.

Posted in Uncategorized | Leave a comment

Active Inference GuestStream Video

A recent video from the Active Inference GuestStream, presenting a pre-print with Emma Holmes and Karl Friston and discussing the nature of language and computation.

Posted in Uncategorized | Leave a comment

New paper out in Ampersand

Joint work with Evelina Leivada is now out in Ampersand. We discuss 10 ambiguous, misused or polysemous terms in linguistics, including I-/E-language, entrainment, reference, ‘the neural basis of X’, (un)grammaticality, third factor, and labeling.

Posted in Uncategorized | Leave a comment

The primitive brain of early Homo

Science today published a new study on early Homo braincase shape, with some commentary from me included. The full paper is here.

Posted in Uncategorized | Leave a comment

Abralin lecture on YouTube

Part of the Abralin series of linguistics lectures: “A Neurocomputational Perspective on Syntax“.

‘How are basic linguistic computations implemented in the brain? Drawing on recent findings from the biological and cognitive sciences, I will propose a neurocomputational model of language comprehension, with particular reference to syntactic and semantic processing. Reviewing the current state of the art, I will defend a multiplexing model of cross-frequency coupling in language comprehension, viewing this higher cognitive capacity as being grounded in endogenous neural oscillatory behaviour. Recent findings from theoretical syntax and semantics will be consulted in order to more carefully frame the implementation and development of this neurocomputational architecture. Alternative accounts in the literature will also be evaluated.’

For more information about Abralin ao Vivo – Linguists Online, visit here.

Posted in Uncategorized | Leave a comment

New regular column at Psychology Today

Starting this week, Psychology Today are publishing new articles of mine under the column “Language and Its Place in Nature“, so more regular writings can be found there. The first piece is about how aging impacts language processing.

Posted in Uncategorized | Leave a comment