Why Do Irregular Nouns and Verbs Exist? An Ancient and Modern Problem

Wolfgang de Melo

When we learn new languages, we inevitably find some of them easier to handle than others. We are then sometimes told that the reason for this is that some languages share a great deal of vocabulary with our native languages, while others don’t. But is that really the only reason? Some languages require us to learn a great deal of morphology, rules that tell us how to create new words and how to inflect them. Others let us get away with less effort. Once we have become proficient in a language, we tend not to think of the effort that went into learning it. But can we honestly say, hand on heart, that as school children we found learning seemingly endless tables for nouns and verbs an edifying business?

For the most part, we put up with these tables because they were useful; once you knew how to inflect auxilium, “help”, you knew how to inflect every noun whose nominative ended in ‑um. But what about all those irregular forms? Nouns like domus, “house”, which refuse to behave and require an entire table to themselves? Why do such irregular words exist? How do they come about and why don’t people get rid of them? Most people don’t dare to ask such questions, thinking that they are at best naive and at worst offensive. However, I firmly believe that these are legitimate questions and that, if we dare to ask, we learn more. I hope that this essay will tell you everything you’ve always wanted to know about irregular forms, but were too afraid to ask. If by the end of this article you have made your peace with such forms in Latin or Greek or other inflecting languages you are learning, I count myself successful. If, like me, you have come to appreciate their beauty and their place in the system, I count myself fortunate.

So, without further ado, let us delve into today’s topic. I will first outline how irregular forms come about, followed by a section on how they often disappear again. Then we will look at what Marcus Terentius Varro (116–27 BC), Rome’s great polymath, made of linguistic irregularity. And finally, I want to conclude with modern, holistic approaches, some of which can explain why irregularity is not quite as bad as it seems to second-language learners and can actually be beneficial to first-language learners.

The Tower of Babel, Pieter Bruegel the Elder, c.1563 (Kunsthistorisches Museum, Vienna, Austria).

I: The Rise of Irregularity

Irregular forms as leftovers of an earlier system: a glance at Old English and Early Latin

Languages like Greek or Latin, with their exceptionally long written record, can show us very clearly how irregularities arise. English, too, has many telling examples. What these languages teach us is that for the most part, irregular forms are leftovers of an earlier system where they fitted in better. Thus, ride – rode – ridden and sing – sang – sung look irregular from a modern perspective because there are relatively few verbs that follow their patterns, such as strive – strove – striven or swim – swam – swum. In the Anglo-Saxon period, on the other hand, these patterns were very common. In Old English, our two sample verbs were rīdan – rād – ridon – ġeriden[1]and singan – sang – sungon – ġesungen.[2]  Note that there are four principal parts rather than three because there is a difference in vocalism between past tense forms of the singular and past tense forms of the plural. The four forms I list are: the infinitive (to ride), the singular preterite (simple past, I rode), the plural preterite (we rode) and the past participle (ridden).

Scholars traditionally call verbs which show such vowel alternations ‘strong’. Rīdan belongs to class 1, singan to class 3a. Verbs of class 3a have an i in the infinitive, followed by a nasal and consonant,[3] and all verbs that fit this pattern inflect accordingly. There are seven main classes of strong verbs in Old English, but not all verbs fit. Those which do not, form their past tenses and participle with suffixes (word endings) containing ‑d‑. It is these ‘weak’ verbs without vowel alternations and with d-containing suffixes which at some point became normal, gradually ousting most of the old ‘strong’ verbs. Those strong verbs which did survive are rare enough to be classified as irregular in present-day English. Occasionally, a weak verb became strong by analogy to similar forms, which happened to wear – wore – worn, irregular nowadays, but a weak verb in Old English (werian, past tense and participle wered); yet most of the time it is the strong verbs that change by analogy to the weak ones, leading to a small pool of irregular leftovers of an earlier system where they were once fully integrated and normal.

Leftovers of an earlier system can be seen in Latin as well. In the Latin second declension, the original genitive plural ended in ‑um. At some point, however, this ending was replaced by ‑ōrum, which was taken from the pronouns (e.g. hōrum, “of these”, illōrum, “of those”). In Classical Latin, filius, “son”, has a genitive plural filiorum, “of the sons”, and monstrum, “(terrible) sign”, has a genitive plural monstrorum, “of the signs”. The old genitive plural forms would have been filium and monstrum, but such forms are already on their way out in early Latin. Plautus uses them in fixed phrases like pro deum atque hominum fidem, “oh the good faith of gods and men”, and elsewhere in contexts where the style is elevated and old-fashioned forms raise the tone. In the Classical period, a number of fixed expressions remain, such as the aedis deum consentium, “temple of the united gods”, and the archaic genitive is also the norm for terms of currency, hence nummus, “coin”, and talentum, “talent”, with the genitives nummum and talentum.

The juxtaposition of Anglo-Saxon Latin (Bede’s Ecclesiastical History, c. 731) and Old English (Cædmon’s Hymn, 7th cent.) in an 11th cent. manuscript (Oxford, Bodleian Library, MS Hatton 43, 129r inf.).

Irregular forms as the result of sound change

Another source for morphological complexity is sound change. Here I do not have space to explain why sound changes happen, although I hope to do so in a future contribution; suffice it to say that sound change is overwhelmingly regular, but it can create morphological irregularity. Let us look at our Old English verbs again: rīdan, rād, ridon, ġeriden; and singan, sang, sungon, ġesungen. Those look like rather different patterns, but at some point in the prehistory of English they were identical. In what follows, I use an asterisk (*) for forms which are not found in actual texts, but which we can reconstruct relatively reliably. At some point, the present contained *e, the past singular had *o, and the past plural and the participle had nothing in place of *e or *o. The pre-forms would have been *reid- / *seng-, *roid- / *song-, *rid- / *sng-. In ‘to ride’, *e / *o / *zero was followed by i, and in “to sing”, by a nasal. Then a few sound changes messed up this neat system: (a) *ei became ī (*reid– > rīd-); (b) *e became i if the syllable end contained n or m (*seng– > sing-); (c) *o became a, and *oi became *ai (*song– > sang– and *roid– > *raid-); *ai became ā (*raid– > rād-); and *n between consonants became un (*sng– > sung-). However regular sound change is, it can result in this kind of morphological mess.

Sound change also leaves us with a big mess in Latin. Think of dăre, “to give”, which isn’t a proper first-conjugation verb, since it has a short ă rather than the long vowel you can see in amāre, “to love”. At some point, the imperative of “to give” must have been *dō, but when the verb began to be integrated into the first conjugation, such a form was replaced by (but note that the plural imperative remains dăte). There were two related imperative forms, singular *ke-dō, “give here”, and plural *ke-dăte, but before the former could be changed to **ke-dā,[4] sound change intervened and created unrecognizable forms. One sound change, iambic shortening (also known as brevis brevians, whereby modō, “by a measure”, came to be pronounced as modŏ, “just now”), shortened the final vowel of the singular imperative, giving us cĕdŏ; and another sound change, syncope, turned the plural into *kedte and then cette. These forms are common in Plautus and function just like other imperatives, but the average Roman would not have recognized any connection with dare, “to give”.

A passage of Plautus’ Menaechmi (early 2nd cent. BC), in which the imperative cedo occurs twice.

Irregular forms through ‘accidental suppletion’

A third reason why morphology can be less than neat is the rise of accidental suppletion. Think of English forms like be, am or was. How can one verb have such different forms? The answer is that originally these forms belonged to different verbs with similar meanings. Over time, some forms became less common, and the separate paradigms coalesced to form a single, heavily irregular one. A similar thing happened in Dutch, but different forms got selected: the Dutch imperative is wees, “be!”, cognate with English was.

Again we have Latin parallels. Next to the present incipio, “I begin”, there is a perfect coepi, “I have begun”. Why do these forms look so different? Well, yet again they belonged to different verbs originally. In Plautus, incipio still has a perfect incepi, and the perfect coepi still has a present coepio. Since these verbs are, to all intents and purposes, synonymous, it was expected that one of them would die out; this expectation was not fulfilled exactly: rather, incepi disappeared alongside coepio, and the two partial paradigms combined to give a new, less regular one (principal parts incipere, incipio, coepi, coeptum). Let us now look how speakers of a language often get rid of such irregular forms.

Comic fresco of the 1st cent. AD (Casa dei Quadretti Teatrali, Pompeii, Italy).

II: The Fall of Irregularity

Levelling

Sound change is regular, but creates irregularity in morphology. But there are also language changes which are clearly meant to restore such regularity. Above we saw that in Anglo-Saxon preterite forms, there was a vowel difference between singular and plural. The ‘weak’ verbs never had such differences, and when these weak forms became the norm, the differences between singular and plural among the strong ones also disappeared. The only instance where present-day English preserves a difference between singular and plural in the simple past is the most frequent verb in the language, to be, where we find was and were. While in German, the strong verbs have remained more productive, here the consonantal difference between was and were has been levelled out, hence ich war, “I was”, and wir waren, “we were”. Such instances of ‘levelling’ stand in an inverse relationship with frequency: the more frequent a verb is, the more likely it is to preserve irregularities. We will come back to this.

Of course levelling also happens in Latin. The sound represented by qu (similar to the initial sound in queen) lost its lip rounding before o or u and became a plain c (similar to the initial sound in king). For coquus, “cook”, we would expect a nominative cocus and a genitive coqui. But normally we find coquus, genitive coqui, while some writers use cocus and coci, but at any rate they are consistent: levelling has taken place. The case of early Latin deivos, “god”, is more complex: an accusative plural deivōs, “gods”, is found in the so-called Duenos inscription (CIL 12.4, early 6th century BC). Then a number of sound changes happened, so that a nominative deivos regularly ended up as Classical deus, whereas an old genitive deivi would end up as classical divi. We would expect levelling to happen, and it did; but the two resulting paradigms, of deus and divus, are more distinct from each other than coquus and cocus, and they subsequently developed divergent meanings, deus becoming a noun, “god”, and divus becoming an adjective, “divine”.

A modern reproduction of Emperor Augustus’ Res Gestae (AD 14), in which he is described as DIVVS (“divine”) (Museum of the Ara Pacis, Rome, Italy).

Recreating predictable forms

Other morphological changes cannot be described as levelling, but they still create more regular paradigms insofar as they make the perfect stem more predictable. Two examples should suffice. parcere, “to spare”, has two inherited perfects; originally, Latin had a perfect and an aorist, two tenses preserved in Greek, but in Latin these merged into a new tense that we call perfect, even though some of these perfects continue old perfects and others continue old aorists morphologically.[5] For most verbs, the merger of perfect and aorist resulted in the elimination of one of the two forms, but parcere preserves an old perfect peperci and an old aorist parsi. Reduplicated forms are a relic category in Latin, they cannot be formed from new verbs. And while s-aorists are one of the more productive categories, parsi feels irregular because it should be *parcsi (and indeed a form *parcsei must have existed, but sound change simplified the consonant cluster). So at some point a new perfect was created, a more predictable form, namely parcui, but it wasn’t successful in the long run and is only attested once, in Naevius (died 201 BC): parcuit, “he spared” (com. 69).

The second example is sumo, “I take”, perfect sumpsi. This verb had a different perfect in early Latin, surēmī,[6] and at some point such a perfect had made sense. Sūmō goes back to *sus-emō, a compound of emō, “I buy” (originally “I take”), and just as emō has a perfect ēmī, the perfect of *sus-emō was *sus-ēmī. But again sound change intervened. The old *sus-emō was syncopated to *susmō, and s was lost before nasal, with compensatory lengthening of the vowel, hence sūmō. Syncopation only affected short vowels, so *susēmī did not lose its internal vowel, but s between vowels became r, and we ended up with surēmī. However, sūmō and surēmī look very different. Hence, a new perfect was created with the productive s-suffix that we saw in parsī, and sūmsī then predictably acquired a p, just as there is variation between the nominative noun hiems, “winter”, and hiemps.

Now that we have seen the rise and fall of irregular forms, let us look at what the ancients made of them.

The sumo wrestlers Takaneyama Yoichiemon and Sendagawa Kichigorō, Katsushika Hokusai, 1790–3 (Metropolitan Museum of Art, New York, USA).

III: Varro on the Question of Regularity and Irregularity in Language

Intellectual context

The question of regularity and irregularity was a major topic for Marcus Terentius Varro (116–27 BC), the greatest scholar of the Roman republic. However, Varro does not stand in an intellectual vacuum; before him, many Greek scholars of the Hellenistic period discussed such issues, and it is from them that Varro took the terms analogia (ἀναλογία) for “analogy, regularity” and anōmalia (ἀνωμαλία ) for “anomaly, irregularity”. Varro presents us with two opposing schools of thought, the Analogists, who believed that morphology was overwhelmingly regular, and the Anomalists, who believed the opposite. It is hard to assess today to what extent there really was such a debate: we have a very limited number of fragments of the Greek scholars in question, and the majority of their statements are preserved in Varro, who may be exaggerating in order to keep our attention. When Varro’s younger contemporary Gaius Julius Caesar (100–44 BC) was not busy conquering Gaul, he was also interested in grammatical questions; he wrote two books De analogia (“On Analogy”, 55/54 BC) but unfortunately, we only have fragments of this important work, which would also have helped us to assess Varro’s stances more objectively.

As far as we can see from the fragments of Varro’s Greek predecessors, the Analogists and Anomalists used quite similar principles and methods, but had different goals, and the opposition between them is somewhat artificial. The Analogists can be identified with the scholars of Alexandria who were in charge of the famous library and engaged in textual criticism. The manuscripts of Homer they were working on already showed unmistakable textual corruptions. But what to do when a line would not scan? Let’s say that it was a bit too short and contained a form ἡλίου, “of the sun”. This is a correct form in the Attic dialect of Greek, and its nominative is ἥλιος. But the Homeric nominative, based on the Ionic dialect, is ἠέλιος; like its Attic counterpart, it belongs to the o-stems, and in Homer many o-stem nouns have a genitive not in -ου, but in -οιο. So we can correct the corrupt form ἡλίου to ἠελίοιο by analogy to these other genitives, and if the line now scans, our job is done. Analogy, then, was largely a tool of textual emendation for the Alexandrians.

The Anomalists, on the other hand, can be identified with the Stoic philosophers of the Hellenistic Period. As far as we can tell, they were not really interested in showing that language is entirely irregular; rather, they took regularity as a given and identified various exceptions.

Julius Caesar dictates De analogia from horseback to his scribes, Jacques de Gheyn II, c.1620 (Ham House, Surrey, UK).

What De lingua Latina is all about

Varro’s De lingua Latina (“On the Latin Language”, 40s BC) was his major work on language. Originally, it comprised 25 volumes; an introduction was followed by six books on etymology, six on morphology and twelve on syntax. The etymological and morphological parts each contained three books on theory and three on practice. Unfortunately, much of this oeuvre has been lost to us. Only six books have come down to us in direct transmission, Books 5 to 10; in other words, the practical etymologies (5–7) and the theory of morphology (8–10). Very little remains of the other books; grammarians and other scholars have the occasional quotation or paraphrase of some small section, but it is not nearly enough to reconstruct what Varro’s theory of etymology or his syntax was like. Most of the fragments are, in fact, from the sections on practical morphology, but again they cannot give us a representative picture.

Today, then, our focus is on Books 8–10, on the theory of morphology. What makes these books hard to digest for a modern reader is that Varro does not simply expound his own views, refuting counter-arguments as he goes along. Instead, Book 8 contains a moderately short introduction outlining Varro’s own beliefs, followed by the arguments against analogy; Book 9 presents the arguments in favour of analogy; and Book 10 shows us what Varro really thinks, giving us a compromise solution leaning towards analogy.

Along the way, many interesting topics are discussed, from grammatical gender to case, from tense to voice. However, the discussion is not straightforward; often, one and the same topic is dealt with in all three books, from each of the different perspectives, and we need to look at all three passages if we want to follow the argument. The reason for this awkward presentation is that all these grammatical topics are not treated for their own sake; rather, they are merely a sideshow used to illustrate the arguments against and in favour of analogy and anomaly.

Title-page to a 16th-cent. edition of Varro’s De Lingua Latina (Gryphius, Lyon, 1563).

So what does Varro really think?

When L.L. Zamenhof invented Esperanto in an attempt to create an international language, he decided that it should be entirely regular. He came up with six participles: there are active and passive forms for past, present and future. For example, a past active participle would mean “having driven”, and a past passive participle would mean “having been driven”, while in the present the active would mean “driving”, and the passive “being driven”. Greek also has a large inventory of participial forms, but Latin gave up most of them, with a few lexicalized leftovers, such as alumnus / alumna, “nursling”, originally a present passive participle (compare the Greek ending ‑μενος). For Varro, the absence of forms is not a problem (9.110): analogy merely promises that paradigms are created in a regular way, not that there should be a fully symmetrical system. Thus analogy can help us to form a participle amans, “loving”, from amare, “to love”, but the fact that next to amare we have amari, “to be loved”, does not entail that next to amans we can expect a corresponding passive participle.

For some of the gaps, there may not be an obvious reason, but for others there is (9.63–4): most nouns have a singular and a plural, but others only have a plural because they consist of several identical parts, and others only have a singular because they are mass nouns. Hence, scalae, “stairs, staircase”, is always plural because it is a collection of identical steps, just as English trousers is a plural noun because of the two trouser legs; Varro notes that we count these nouns which are always plural in a different way: trinae bigae, “three chariots” (lit. “three sets of two-horse teams”), versus tres Musae, “three Muses”. Nouns like oleum and acetum (9.66) are mass nouns, just as their English equivalents oil and vinegar are; they are always singular because we cannot count them – we can only say “more” or “less” oil or vinegar. And some nouns are in between (9.67): vinum, “wine”, can function as a mass noun, but if we are talking about different types of wine, we can pluralise, again just as in English.

Roman amphorae preserved in Pompeii, 1st cent. AD.

For Varro, there is a large conceptual difference between etymology and morphology (8.6). The etymologist needs historia, a solid grasp of earlier texts and the history of the language. The morphologist, on the other hand, requires ars, a concise outline of as few rules as possible. Morphology is meant to be economical; the minimum number of rules and paradigms should give us the maximum number of forms.

In 10.22–3, Varro explains what a paradigm is; it is essentially a table based on two categories, such as case and number. The paradigm then presents all possible forms of a noun or a verb. Such a sample paradigm can subsequently be used to create all forms of another word that is comparable. In order for two words to be comparable, they have to agree in res / materia and vox / figura. When Varro speaks of res or materia, he means the internal features of a word; not its lexical meaning, but categories such as case or tense. For two nouns to be comparable, they have to be in the same combination of case, number, gender and declension class. Vox and figura are synonyms referring to the outward shape of a word, or more specifically, to its endings.

A Varronian example (9.91–4) can perhaps illustrate this nicely. lupus, “wolf”, and lepus, ‘”hare”, are identical in vox or figura because they both end in us. They are also quite similar in res or materia because they are both in the nominative singular and are masculine; however, they disagree in declension class, and so they have different paradigms, witness the genitives lupi and leporis. Using another case like the genitive or the vocative is a legitimate way to find out whether two seemingly comparable words are genuinely comparable, as these additional case forms can disambiguate when there is an underlying distinction that is not visible on the surface in the nominative. Varro uses a lovely image (9.43): the nominative is sometimes like a dark room, making it difficult to see what is inside; but the vocative is like a candle that brings light and helps us to determine the internal features. And with this, we have essentially arrived at word-and-paradigm morphology, the only morphological model in existence in the Greco-Roman world, a model which is also gaining traction again today. A word is presented together with its principal parts that allow us to determine unambiguously its precise class membership and to produce all potential forms correctly once we have a model paradigm.

Astronomer by candlelight, Gerrit Dou, 1650s (J. Paul Getty Museum, Los Angeles, CA, USA).

Varro is perfectly aware of the many irregularities in the system. But how does he cope with them? He draws a distinction that is still considered valid today, even though none of his predecessors, contemporaries or successors in the ancient world does so – the distinction between derivation and inflection. Derivation is the creation of new words, such as greenhouse from green and house (compounding), or electricity from electric and an abstract suffix ‑ity (derivation proper). Inflection does not create new words, but establishes all possible forms for any given word. Varro calls derivation declinatio voluntaria, “voluntary inflection”, while inflection is declinatio naturalis, “natural inflection”. These terms were chosen because we have some power over how we form new words, while these new words are then inflected in accordance with pre-existing rules:

voluntarium est, quo ut cuiusque tulit voluntas declinavit. sic tres cum emerunt Ephesi singulos servos, nonnumquam alius declinat nomen ab eo qui vendit Artemidorus, atque Artemam appellat, alius a regione quod ibi emit, ab Ionia Iona, alius quod Ephesi Ephesium, sic alius ab alia aliqua re, ut visum est. contra naturalem declinationem dico quae non a singulorum oritur voluntate, sed communi consensu. itaque omnes impositis nominibus eorum item declinant casus atque eodem modo dicunt huius Artemae et huius Ionis et huius Ephesi, sic in casibus aliis. (9.21–2)

The voluntary type is the product at which an individual has arrived by inflection as his will has carried him. So, when three people have bought one slave each at Ephesus, sometimes one inflects the name from the man who sold him, Artemidorus, and calls him Artemas; another inflects it from the region because he bought him there, Ion from Ionia; another, because he bought him at Ephesus, calls him Ephesius; in this way everyone names him after some other thing, as seems appropriate. By contrast, I call natural inflection the one which does not arise from the will of individuals, but from general consensus. Thus when the names have been assigned, everyone inflects their cases alike and says in the same way in the genitive Artemae and Ionis and Ephesi, and so in other cases.

Varro notices, entirely correctly, that inflection is overwhelmingly regular, with few exceptions, while derivation allows for a much greater degree of irregularity.

Title-label for a crib of irregular Latin verbs, from which the table standing at the top of this article was taken (J.D. Collis, Bromsgrove, 1854).

This is all well and good on a descriptive level, but why does irregularity exist in the first place? For Varro, there is nothing positive about irregularity. It comes about when words are formed incorrectly to begin with, or when they undergo sound changes that are like gradual decay (5.5–6). But Varro does not believe in a golden age of language where everything was perfect and regular; over time, some words also become more regular, and that is how it should be.

Is there anything we can do to help with this process of regularization? Well, yes and no. In 10.74, Varro notes that those in public office are remarkably powerless in this respect, since they have to follow common usage or risk being laughed at. It is the general populace that determines what is acceptable usage and what isn’t. But the public needs some guidance, and this guidance should ideally come from playwrights, since stage performances are popular and people are bound to imitate what they hear there. Such a statement may make us smile, but Varro is not far off; Australian and American soap operas did not introduce rising intonation (where the pitch goes up) in statements into British English, but their great popularity has been argued to be a factor in spreading the phenomenon. And on this note, let us return to our times.

The erstwhile title sequence of a soap that has much to answer for: Neigbours (1985–2022).

IV: Back to Modern Times

Already in the early Empire, grammarians saw little need to discuss the question of analogy and anomaly any more. Analogy was taken for granted as one of the principles of language, and it seemed that the argument was settled once and for all; Varro’s thoughts on the matter were old hat. Most modern linguists also emphasize regularity in morphology. After all, it would be impossible to learn an inflecting language if every word had a completely different paradigm. And while this is essentially correct, the question remains how we should deal with the irregular forms that do exist.

Insights from first-language acquisition

First-language acquisition occurs in several stages. As far as irregular forms are concerned, it is remarkable how consistently very young children produce them correctly at first, only to unlearn these correct forms and to relearn them later on. Thus, a child may at first use ran as the past tense of run, or drew as the past tense of draw. Some time later, the same child may say runned or drawed. And finally, the child will revert to ran and drew. What this progression pattern shows is that very young children memorize and repeat what they hear, and they do so correctly. But at some point, they begin to understand how past tenses are formed, and that they normally end in ‑ed; once the rule has been grasped, they over-generalize it and apply it to verbs which don’t fit the pattern. But as their exposure to adult language continues, they realize that there are exceptions and memorize them one by one.

Over time, much of the irregular material gets replaced, but some of it remains. What goes and what stays is not simply due to chance, but has to do with frequency patterns. Highly frequent words are more likely to remain irregular than rare ones because there is sufficient exposure to the irregular forms, while this may not be the case for rare ones, which are far more likely to adopt regular and productive patterns. The same also goes for frequent collocations, which can preserve fossilized forms already dead elsewhere; think of in the olden days, which retains a fossilized dative plural olden,[7] opaque to speakers of present-day English, who wonder why this by-form does not exist outside this specific phrase.

Still from the 1933 Mickey Mouse cartoon “Ye Olden Days”.

Insights from psycholinguistics

Ancient grammarians, but also many descriptive linguists of the 21st century work with a grammatical model that leads to great descriptive elegance. We establish rules for word formation and we create model paradigms and search for principal parts that are economical.

A medium-sized dictionary does not have to contain all words that are the result of productive morphology; for example, if it lists the suffix ‑ness along with its function, such a dictionary would merely clutter up unnecessarily if it then listed words like blackness or redness, whose form and meaning are fully predictable.

Principal parts are good if class membership can be determined with a minimum of forms; for instance, a Latin nominative singular in ‑us is ambiguous between second and fourth declension, but if we also learn the genitive with it (‑ī versus ‑ūs), the ambiguity disappears. For a verb, the present infinitive in combination with the first person singular present indicative active, the first person singular perfect indicative active and the perfect passive participle allows us to create every single form in the paradigm. Other forms could have been chosen as principal parts: for example, in Hebrew it is generally the third person singular ‘perfect’ qal that is selected because it has no affixes and no geminate consonants as stem modifications, but it has great predictive power. Principal parts are always a little arbitrary, but we think of them as good if the smallest number possible allows us to predict all forms in a paradigm.

This is descriptive elegance. Descriptive elegance is what we like in a smaller dictionary or a learners grammar. Problems only arise when people believe that descriptive elegance mirrors the much messier situation inside our brains. Linguists are not immune to this; witness slogans such as The lexicon is like a prison, it only contains the misfits. The lexicon here is the mental lexicon, which is said to contain only the irregular forms, the misfits; so not redness, but business, which is derived from busy historically, but no longer means “the state of being busy”, and which is pronounced as two syllables rather than the three one would expect from the rules of word formation.

The author (standing, fifth from right) with the staff of the Thesaurus Linguae Latinae and the students of the lexicographical summer school of July 2022 (Schola Aestiva Anni MMXXII), which produced the entries noctivagus and rotundare.

But is this how our brains really work? Most psycholinguistic studies on the subject beg to differ. When people speak English at moderate speed, neither quickly nor slowly, they typically utter around three words per second. That is a whole lot to take in. If we had to decompose every word into morphemes to figure out its meaning, we would not be able to process speech at the rate we do; our brains therefore store many complex (but well-behaved) words as wholes, even though we could in principle decompose them. However, storing absolutely everything would not be economical either because it would take up too much storage space; and so we also decompose many words. How we handle individual words depends on their frequency: the more frequent a word is, the more likely we are to store it as a whole, because in this case, processing speed is more important than storage space; and the rarer a word is, the more likely we are to decompose it and to divide it into morphemes, because there is little point in storing such rare words.

Let us look at redness and business again. Redness is a rare word; when we hear it, we cut it up into red and ‑ness, and there are no odd features of pronunciation or of meaning – we can predict both of them from the constituent parts. On the other hand, business is a frequent word; we store it as a whole, we pronounce it in an irregular fashion (two syllables rather than three, as one might expect from segmentation), and we do not think of its meaning as “a state of being busy”. In fact, if that is the meaning we want to express, we can say busyness, a new formation that needs to be decomposed, but consists of three syllables and has regular pronunciation. But storing words as complete, unanalysed wholes often has these side-effects; it leads to unusual phonology and to irregular meanings. We can see this in Latin too. From amabilis, “lovely”, we can form an abstract noun amabilitas, “loveliness”, which is fully regular in pronunciation and meaning, as befits a rare word. But from civis, “citizen”, we get civitas, a frequent word, and here we can observe the development of irregularity in both pronunciation and meaning: Italian città and French cité are not quite what one would expect phonologically, and the meaning of these words is “city” rather than “the state of being a citizen”, a meaning already common in later Latin (think of St Augustine’s De civitate Dei, “The City of God”, rather than His citizenship).

Saint Augustine, Philippe de Champaigne, 1640s (Los Angeles County Museum of Art, CA, USA).

Going separate ways

Increasingly, different types of linguists are going separate ways. Those writing smaller grammars of Latin or documenting hitherto unwritten languages tend to focus on descriptive elegance, on rules and paradigms and principle parts. And those involved in large grammars or dictionaries like the Thesaurus Linguae Latinae (1900–) take the bigger picture into account and come much closer to a representation of the mental processes that native speakers would have had. Ultimately, however, we need both. The brief and elegant outline of a system is ideal for learners, as a shorthand description of our mental processes that leaves out the finer details, but the larger documentations enable us to see the larger patterns that matter in language variation and change.

Every competent speaker, whether native or not, will have stored some complete paradigms, but thereafter native learners often encounter only partial paradigms in daily life. That has consequences: analogy plays a huge role in creating the missing forms on the hoof; but analogy can also go wrong when a form in a paradigm is ambiguous.

In derivation, this can lead to doublets. Such doublets normally have slight differences in meaning and may persist. Here is an interesting case of doublets, taken from Ingo Plag:[8] from ethnic we can derive ethnicity, but also the rarer form ethnicness; while the latter may sound bad at first (and should be ‘blocked’ by the prior existence of ethnicity), the two suffixes have slightly different meanings, and under appropriate circumstances ethnicness can be used legitimately:

(1) Her ethnicity was not a factor in the hiring decision. We are an equal opportunity employer.

(2) Her ethnicness was certainly a big factor in the director’s decision. He wanted someone who personified his conception of the prototypical Greek to play the part.

The two words are not interchangeable.

There is an interesting discussion in Gellius (13.3): Gellius argues that suavitas, “sweetness”, and suavitudo are exactly identical in meaning and usage, as are sanctitas, “holiness”, and sanctitudo, or acerbitas, “bitterness”, and acerbitudo, or acritas, “sharpness”, and acritudo. This seems a priori unlikely, and indeed we can establish differences in usage for those doublets which are attested sufficiently frequently. But the main point Gellius is trying to make is that those grammarians who want to see a difference between necessitas and necessitudo are wrong. These grammarians claim that necessitas refers to the unavoidability of a state or a situation, while necessitudo refers to the unavoidability, and hence closeness, of friends and family. Who is right, Gellius or the grammarians? Gellius manages to give examples that go against the grammarians’ doctrine; he finds one example of necessitas for human relationships in Caesar, but admits that the usage is rare, and he presents an example of necessitudo for an external situation from Sempronius Asellio. But even if originally these were genuine semantic doublets, they could only persist if there was some secondary differentiation, and indeed throughout his work Gellius himself uses the two words exactly as the grammarians describe them, the very grammarians he loathes so intensely.

Partial paradigms with ambiguous forms can also lead to diachronic change. Compare olĕre, “to smell (of)”, which is a third-conjugation verb in early Latin, but ends up in the second conjugation, as olēre; or tonĕre, “to thunder”, which ends up as Classical tonare. Among nouns, velum, “sail”, is reflected as a feminine in French (la voile), not unlike folium, “leaf”, which gives us feminine la hoja in Spanish. Such shifts happen when speakers are confronted with isolated, ambiguous forms: olui, “I smelled”, could belong to the third or second conjugation, and tonuit, “there was thunder”, could be part of the third conjugation or the second or even the first. Similarly, the neuter plurals vela and folia look very much like feminine singular forms, hence the reanalysis. In order to understand such diachronic changes fully, we need to understand the basics of psycholinguistics.

The sequential levels of human speech.

V: Final Thoughts: Embracing Irregularity, and Child-friendly versus Adult-friendly Languages

I hope that by now you have made your peace with irregular forms; understanding how they come about and persist may even have made them interesting to you. But perhaps I can even encourage you to embrace their existence, at least partially, because irregular forms can be very useful, at least for first-language learners.

In this connection, we need to make a distinction between categories and their outward realization, a distinction not unlike the Varronian contrast between materia and figura. Among categories we find such abstract concepts as case or tense, and then the subcategories nominative, accusative and so on. The outward realization consists of the actual endings one has to learn. A first-language learner has to learn both categories and outward realizations; a second-language learner, on the other hand, will already have some awareness of the relevant categories, or can be told about them efficiently through a grammar book, and it is the outward realizations that will require effort to master.

This distinction is also made by James Blevins, who states:

This is particularly clear in the case of suppletion, though many deviations from regular patterns will tend to enhance the discriminability of forms and aid communicative efficiency. In English, for example, a regular preterite form such as walked is much less clearly discriminated from the present/base form walk than suppletive went is from go… By highlighting communicative contrasts that are distinctive in a language, irregulars can clarify the oppositions between less well discriminated regulars.[9]

Let us unpack this a little. First-language acquisition is very different from second-language acquisition. First-language acquisition starts from scratch; babies have to learn everything, from distinctive sounds to syntax and vocabulary. In the realm of morphology they have to learn not only how a specific category like tense or gender is realized, but also which categories exist in their particular language in the first place; does this language have tense? Gender? Case? And if so, how may tenses, genders, cases?

The distribution of the world’s language families (a more detailed version can be studied here).

Second-language acquisition is very different. Second-language learners already have a linguistic system in place, so they do not acquire this new language in a vacuum. Rather, they will compare and contrast; they will not need to be told what a German Apfel or a French pomme is ontologically, they simply have to equate it with English apple. Similarly, for most linguistic categories, like tense or gender or case, they will already have a pre-existing framework, and they will try to contrast the new system with the one they already know.

First-language learners benefit from irregular forms because they highlight very clearly what categories they need to acquire; sum, “I am”, is so different from fui, “I have been”, that the contrast between present and perfect cannot go unnoticed, while it is less clear in audio, “I hear”, and audivi, “I have heard”. Once the child has figured out that this is an important categorical contrast, the acquisition of the regular forms is sped up, and given children’s immense memory capacity, learning irregular forms does not slow them down much, while any delay in understanding the system of linguistic categories would be more detrimental. And this, in a nutshell, is why irregular forms are helpful to first-language learners and why I hope you will embrace them.

But, admittedly, it is a different story for second-language learners. Memory capacity has already decreased, while the understanding of linguistic categories normally does not come about through random exposure, but through helpful grammar books. Second-language learners have an easier ride learning linguistic categories; but they now have to struggle with a whole set of irregular forms whose benefit has been lost and which require more effort to memorize than in early childhood.

This has repercussions on language structure. There are no primitive languages; all languages have a great deal of complexity. But some languages have more morphology than others. Partly this depends on language family: the Athabaskan languages in North America (like Navajo and Apache) are incredibly rich in verbal morphology, while the Sinitic languages (like Mandarin and Cantonese) have very little; Indo-European languages are somewhere in the middle. But within one language family, there is a remarkable tendency: languages which are often learned in adulthood, as a lingua franca, have less morphology, and more regular morphology, than those which do not have such functions. Think of Bahasa Indonesia (standard Indonesian), which is an Austronesian language that is extremely simple morphologically, while its immediate neighbours are no less Austronesian, but very complex; or think of the Bantu languages in Africa, almost all of which have complex tonal systems, with the exception of the non-tonal Swahili, which is again a lingua franca. What you can see here is that languages which are predominantly learned as first languages can afford to preserve morphological and other irregularities, which pose no problems in this case and in fact offer some benefits; but languages which are also learned by a large number of adults will simplify their morphology and grammar because of the constraints of second-language acquisition.


Wolfgang de Melo is Professor of Classical Philology at Oxford. He has published on early Latin, especially Plautus and Roman comedy, and on Varro. He teaches linguistics and comparative philology and has a special interest in linguistic typology. He has previous written for Antigone on grammatical gender, Latin spelling, and Latin accents.


Further Reading

My favourite introduction to Varro is David Butterfield’s Varro Varius: The Polymath of the Roman World (Cambridge Philological Society, 2015), with a wide range of contributors covering the many works Varro produced. Those specifically interested in his linguistic output may find my work helpful: Varro’s De lingua Latina: Introduction, Critical Text, Translation, and Commentary (Oxford UP, 2019). Varro’s sources, linguistic and philosophical, are discussed in Federica Lazzerini’s doctoral thesis, The Scholar at Work: Varro’s Study of the Latin Language (Oxford, 2021), which she is currently reworking into a monograph to be published by Oxford University Press.

Caesar’s grammatical doctrines are discussed authoritatively in Alessandro Garcea’s Caesar’s De analogia: Edition, Translation, and Commentary (Oxford UP, 2012).

English word formation is a complex topic. As an accessible introduction, I recommend Ingo Plag’s Word-Formation in English (Cambridge UP, 2003). ‘Word-and-paradigm’ morphology, the successor of ancient models of morphology, had one of its starting points in a lucid essay by Robert Robins, “In defence of WP,” Transactions of the Philological Society 58 (1959) 116–44. An up-to-date treatment can be found in James Blevins’ Word and Paradigm Morphology (Oxford UP, 2016).

For the concept of adult-friendly and child-friendly languages see John McWhorter’s The Language Hoax: Why the World Looks the Same in Any Language (Oxford UP, 2014).

Notes

Notes
1 In this essay I only mark vowel length where it is important for the argument, and sometimes I use slightly anachronistic forms if the argument is not affected; for instance, I may speak of early parsī, “I have spared”, where the original form would have been parsei in spelling and pronunciation.
2 The dot over the g in this last word is a modern convention to help learners of Old English: the letter g could be pronounced as a hard sound (as in guest) or as a soft one (as in year), and the dot indicates that we are dealing with the soft pronunciation.
3 Nasal (m/n) + consonant is not a particularly common pairing in Modern English, but exists in words such as finger.
4 I use a double asterisk here to indicate a form which one might reasonably expect, but which was never actually created.
5 The Greek aorist (ἐπαίδευσα, “I educated”, from παιδεύω, “I educate”) is a past tense that describes an event as a whole (“I educated the children and taught them the entire curriculum”). The perfect (πεπαίδευκα, “I have educated”) describes a past action with a present result (“I have educated the children and now they know their stuff”).
6 Cf. Paul. Fest. p. 299M on Livius Andronicus.
7 In Old English, the preposition in required the dative, as it still does in German.
8 2003: 67; see “Further Reading” for full reference.
9 2016: 201; see “Further Reading” for full reference.