Just now, I had a dumb little idea that I absolutely have to write about: nominal symmetrical voice. What would happen if a language that used an Austronesian Alignment system evolved nominal tense-aspect-mood marking as Coptic has? In truth, this is probably entirely unnaturalistic, but I thought it a fun idea to play with so that’s what we’re going to explore today.

I recently finished up an essay on triconsonantal roots and logoconsonantal writing systems, and within that I covered two topics that are pertinent here: the evolution of nominal TAM in Coptic and the morphological and syntactic differences between clitics and affixes cross-linguistically. The former is more obviously important than the latter, but this distinction between the two will soon be important for what I hope to outline here.

Scouring through some academic articles, I came across Arnold Zwicky and Geoffrey Pullum’s description of a set of differences between clitics and affixes that bears repeating before I explain any further:¹

Clitics tend to “care” less about what they’re attaching to, able to grab onto whichever word happens to fall in the right position. In contrast, affixes tend to only attach to one kind of word, be it a noun or a verb or whatnot.
Affixes tend to have arbitrary gaps in which words they can attach to and the conditions under which they can appear.
Affixes tend to care more about the phonological qualities of the word they’re attaching to while clitics are much less discerning.
Affixes have particular semantic baggage and idiosyncrasies—their meaning can be a lot more variable depending on the particular word they’re attaching to, while clitics tend to be pretty constant in this regard.
Syntactic rules can shuffle words with affixes around much easier than they can words with clitics. This is kinda related to the first difference, seeing as clitics cling much more to a particular position in a clause while affixes cling much tighter to a particular word.
And finally, clitics can attach to other clitics, but affixes (usually) cannot attach on the outside of a clitic. That is: [stem]+[clitic]+[clitic] is allowed, but [stem]+[clitic]+[affix] is not. Note, however, that this doesn’t hold up in some languages, though it does seem to be the case most of the time.

These rules will be important momentarily, but first I should run through a brief description of nominal TAM. As it manifests in Coptic, this feature is characterized by the appearance of tense, aspect, and mood clitics on the subject of the sentence rather than on the verb, where we’d usually expect them. As I talk about in my triconsonantal root essay, this seems to have arisen during and after a period which saw the Egyptian language shift from being fusional to being more analytic. Afterwards, Coptic developed a whole hell of a lot of cliticization, but that shift is important as it seems to have set the stage for the appearance of nominal TAM. While VSO was popular in Middle Egyptian, SVO became increasingly common, and while I want to stress that I am not an expert on the topic, this seems to have left some of the TAM markers in that sentence-initial position—possibly related to their having been auxiliaries—where they remained beside the subject. As we’d expect of clitics, if the subject itself becomes a clitic on the verb, these TAM markers simply attach on top of it.² Furthermore, only some TAM markers precede the subject, others always appearing between it and the verb (where one might expect them to appear).

In sketching a little language today, we’ll emulate this: we’ll start with a SVO, analytic language, then have it evolve into an SOV, semi-fusional (mostly agglutinative) language that features clitics on the subject which one would expect to appear on the verb.

The only difference will be that these will be the sort of markers one finds in symmetrical voice systems, or languages with Austronesian Alignment. I’m sure plenty of you already know how this works, but it is worth running through it real quick to clarify what I mean. Essentially, verbs bear some voice marker that defines the syntactic role of their subject—in the same way that a passive voice marker defines the verb’s subject as having the semantic role of the patient—but instead of being restrained to the agent and patient roles, these markers can also indicate that the subject is the beneficiary of the action, or the location where it occurs, or the reason, etc. In any case, this system is really just a regular voice system on steroids. Causatives, applicatives, passives—these are the bread and butter of the symmetrical voice system. Our language will develop a set of these markers from auxiliary verbs that have worn down into particles and, if all goes well, eventually into clitics. As in Coptic, some of these will follow the verb when it moves to a medial position; others though will remain, attaching to the subject.

Alright, now that we’ve laid out the foundation, I can go ahead and describe the phonology and phonotactics of our experimental language; I should warn you, I’m something of a religious adherent to the school of diachronic language creation, so if you want to skip ahead to the development of the grammar, you can head to the second part of this essay without missing out on much.

1 | PHONOLOGY & DIACHRONICA

I’m gonna have a little fun with this. I don’t want to stretch out this section since it’s not meant to be the focus of the essay, but I’ve been itching to make a language with certain phonological similarities to Russian, though I’ve also been digging into the Sami languages recently, so who knows where we’ll end up drawing our influences from. For our modern consonant inventory, we’ll go with the following:

Consonants		Labial		Coronal		Velar
Consonants		soft	hard	soft	hard	soft	hard
Nasal		mʲ	m	nʲ	n
Stop	fortis	pʰʲ	pʰ	tʰʲ	tʰ	cʰ	kʰ
Stop	lenis	pʲ	p	tʲ	t	c	k
Affricate				t͡ɕ	t͡s̠
Sibilant				ɕ	s̠
Fricative		fʲ	f	θʲ	θ	ç	x
Approximant		vʲ	v	lʲ	l	j
Tap				ɾʲ	ɾ

⠀

Vowels
	Front	Back
High	i	u
Mid	e	o
Low	a

The language will have register tone, having lost its old voicedness distinction, but before we get too far into that we should talk about the proto-language from which we’re going to evolve this one. Upon seeing its phonology, you may expect certain series of consonants to be directly related to some in the modern language, but we’re going to take a bit of a circuitous route in getting there.

Consonants		Labial	Coronal	Palatal	Velar	Glottal
Nasal		m	n
Stop	voiceless	p	t		k
	aspirated	pʰ	tʰ		kʰ
	voiced	b	d		g
Fricative			s			h
Approximant			l	j	w
Tap			ɾ

⠀

Vowels
	Front	Back
High	i	u
Mid	e	o
Low	a

The voicedness distinction that the language will lose isn’t that voiced series in the stop section; no, those, along with the aspirated stops, will lenite, yielding a variety of fricatives. Then, prenasalization will yield a new voiced series which will promptly be lost during tonogenesis, along with the voiced fricatives, but not before a couple of those become approximates. There’s little use in elaborating further: I ought to just show you.

The first stage of our language’s evolution will begin with a quick palatalization of alveolar stops in order to get those two affricates that are present in the modern language. I briefly considered having /u/ contract onto alveolar and velar stops, with the former evolving into labio-palatalized affricates, but I figured that was a bit too fancy for our purposes. Instead, we’re just going to use plain old palatalization. To put this in writing:

/tj dj/ → /t͡s d͡z/

Our next two changes will work in tandem to yield the long vowels that characterize the medial stages of the language’s development. These will be lost in the same way that Greek lost its long vowels: yielding diphthongs that then iotacize or otherwise evolve into other sequences. These changes are:

Loss of /h/.
Lengthening of adjacent vowels.

Via this third rule, /{ai ei} {au ou}/ become /eː oː/. Similarly, /i e/ and /u o/ preceding another vowel become /j/ and /w/ respectively.

Now, we ought to go ahead and change some of our consonants so that we can get the fricatives that are present in the modern language. During the evolution from Ancient Greek to Modern Greek, the aspirated and voiced series lenited, so we’ll go ahead and emulate these changes here:

Lenition of voiced and aspirated stops and affricates.
Prenasalization of stops and affricates.

At the end of this phase of the language’s development, we have lost and regained a series of voiced stops, but now we also have our fricatives (though we’ll lose quite a few of them during our later tonogenesis). In any case, we ought to keep chugging along so that we can get to the real meat and potatoes of this essay.

Rapid-fire, here is the next phase of the language’s development:

Loss of prenasalization.
Devoicing of sibilants and affricates.
Contraction of palatalization onto preceding consonants.
/t͡sʲ sʲ kʲ gʲ xʲ ɣʲ/ → /t͡ɕ ɕ c ɟ ç ʝ/
/{w v} ʝ/ → /ʋ j/
Long vowels diphthongize and vowel length is lost.

Most of these are pretty straightforward, but I should note that this eleventh change depends on the vowels that appear later in the word. In short, /eː/ and /aː/ become /eu/ and /au/ when a back vowel appears later in the word, and similarly /oː/ becomes /oi/ when preceding a front vowel. Otherwise, /eː aː oː/ become /ei ai ou/ respectively. This will lead to some fun stem alternations later when we shed the vowels that triggered this split.

Anyways, this leads us to the last phase of the language’s development: the phase during which we generate tones and shift towards an analytic structure (due to the loss of certain unstressed syllables). Here are the changes for this period:

Voiceless stops aspirate.
Voicedness distinction lost in stops and fricatives.
/ou au eu/ → /u av ev/
Iotacism: /oi ai ei/ → /i e i/
Unstressed word-final vowels reduce to schwa and drop.
Metathesis of coda /t/ with an adjacent onset fricative.
Assimilation of adjacent stops.

Voila! We have our modern language. Via these sound changes, a word from our proto-language such as /nahikosta/ would become /nʲevkʰos̠/. If we take a sample at each stage, its evolution looks something like this: /nahikosta/ → /neːkosta/ → /nʲeukosta/ → /nʲevkʰos̠tʰ/.

For the proto-language, the maximally complex syllable will be CJV[ntsr], where J is /j/ or /w/. Furthermore, stress will be universally penultimate. Via the sound changes we just outlined, the maximally complex syllable in non-word-final syllables in the modern language will be CWVW[ntsr] where W can only be /ʋ/. If the coda of one syllable is /t/ and the next syllable begins with a stop, the /t/ assimilates with that following onset. Word-finally, we’ll see some final syllables structured CV[v][ntsr]C. For an example of the maximally complex syllable, we might take the hypothetical word /kwahatsu/ from the proto-language; this would evolve into /kʰʋaʋs̠tʰ/, though I imagine something like this would be quite rare, since it requires a particular arrangement of phonemes in the proto-language. And lastly, geminate stops are disallowed word-finally; an /e/ will be inserted after a soft geminate or an /a/ after a hard one in order to avoid this; something like /θakk/ becomes /θakka/ while /θacc/ becomes /θacce/. Similarly, if a clitic would attach attach to the end of a word ending in a cluster, an epithentic /e/ or /a/ is inserted first, thus /nʲeʋkʰos̠tʰ/ with the clitic /na/ becomes /nʲeʋkʰos̠tʰana/. For a soft example, /kʰʋas̠tʰʲ/ with the clitic /na/ becomes /kʰʋas̠tʰʲena/. If the clitic is only a consonant, this triggers regardless of the coda of the final syllable. With the clitic /n/, /kʰʋatʰʲ/ becomes /kʰʋatʰʲen/ and /nʲeʋkʰotʰ/ becomes /nʲeʋkʰotʰan/.

In any case, we should talk about tone. Let’s take our example word /nahikosta/ and imagine a similar word /nahinkosta/. Evolving these words would yield /nʲeʋkʰos̠tʰ/ and /nʲeʋkos̠tʰ/ respectively, but the lack of aspiration on that /k/ won’t be the only difference. During the first two steps of the final phase of the language’s evolution, the loss of a voicedness distinction between many consonants will yield a register tone system. In short, syllables that began with a voiced consonant before this change will feature low tone, and those that began with a voiceless consonant will feature high tone. Because of the aspiration of voiceless stops before the loss of voicedness, this means that syllables beginning with aspirated consonants will always feature high tone; similarly, unaspirated stops will imply low tone. Taking our two example words again, if we look at their respective underlying tones before the loss of their final vowels, they’d look something like this: /nʲeʋ^L.kʰos̠^H.tʰa^H/ and /nʲeʋ^L.kos̠^L.tʰa^H/.

The loss of this final vowel will create a floating tone, something like: /nʲeʋ^L.kʰos̠tʰ^H ^H/ and /nʲeʋ^L.kʰos̠tʰ^L ^H/. This floating tone will apply to any clitic that attaches to the end of this word.

As for our romanization, we’ll go ahead and do something like this:

Consonants		Labial	Coronal	Dorsal
Nasal		m	n
Stop	fortis	p	t	k
Stop	lenis	b	d	g
Affricate			c
Sibilant			s
Fricative		f	th	h
Approximant		v	l	y
Tap			r

Palatalization will only be indicated when a /a/, /o/, or /u/ follow a soft consonant, in which case a ⟨y⟩ will get inserted between them. Hard consonants never appear before soft vowels, /i/ or /e/, so the reverse situation isn’t something we need to worry about. Thus, /nʲeʋkʰos̠tʰ/ would be written ⟨nevkost⟩. As for the vowels, they’ll be represented as you’d expect: ⟨i u e o a⟩, nothing unusual. A syllable with a high tone will be indicated with an acute on the vowel: ⟨í ú é ó á⟩. Our example, /nʲeʋkʰos̠tʰ/, becomes ⟨nevkóst⟩, while /nʲeʋkos̠tʰ/ is ⟨nevgost⟩. We could leave the fortis-lenis distinction unrepresented in the romanization, as it will be indicated in the tone of the vowel most of the time, but since some clitics lack tone, this may leave certain distinctions unrepresented, so we’ll leave it as is.

I also debated dropping the fortis-lenis distinction entirely, but I’m a sucker for aspirated consonants so I think we’ll hang onto it for now. Other than that, I think this closes our discussion of the language’s phonology, phonotactics, and evolution, which means we can now talk about its grammar.

2 | GRAMMAR

Okay, so we have our phonology and phonotactics. Now, it’s time to outline what the proto-language’s grammar will have looked like and how this’ll evolve into the modern language. From here on out, we’ll refer to the modern language as Nevkóst /nʲeʋkʰos̠tʰ/ and the proto-language as Old Nevkóst.

I really should make this a head-initial language. We’re going to say that Old Nevkóst was an SVO, head-initial language while Modern Nevkóst has become an SOV, mixed but predominantly still head-initial language, much in the vein of German.

For the most part, Nevkóst will mostly feature prepositions, a few postpositions peppered in; again, for the most part, its adjectives will follow the noun they modify; relative clauses will follow their respective nouns; and the possessor will follow the possessee. Interestingly, Modern Nevkóst will mark possession mostly through the use of a set of possessive suffixes, though there will also be a new genitive case clitic that can be used instead. Thus, the average sentence might look something like:

Nevkóst svéthri melenna ya.

/nʲeʋkʰos̠tʰ s̠ʋʲeθɾʲi mʲelʲenna ja/

Nevkóst svéth=ri melen-na ya

Nevkóst language=1PL beautiful-DEF is

Nevkóst is our beautiful language.

As you can see, an adjective follows the noun it modifies and the (cliticized) possessor follows the possessee. You might’ve picked up that the language features a definite article /-a/ that elongates the final consonant of any word it attaches to. If it attaches to a final vowel, an epithentic /j/ is inserted between them. In addition, this definite article is attached to any adjective modifying a noun that is also considered definite, as those modified by possessive clitics are.

But we’re not really here to talk about word order. We’re here to talk about nominal symmetrical voice, so it is about time I got around to exploring the topic in any respectable level of detail.

Before its shift to an SOV word order, Nevkóst developed a set of auxiliary verbs used to mark voice that eventually were reanalyzed as a symmetrical voice system; instead of having a direct case and indirect case, word order was used to differentiate the two, until eventually there developed a system of differential object marking in which animate nouns featured a particular preposition that served as an accusative case marker. Thus, in Middle Nevkóst you might see a sentence like the following:

Rē anga la swetha we nēkosta.

/ɾeː aᵑga la sweθa we neːkosta/

rē anga la swetha we nēkosta

1PL AV do speak ACC Nevkóst

We speak Nevkóst.

Time passed, and the language shifted to an SOV word order, but these voice auxiliaries remained in that medial position, eventually wearing down into particles and finally clitics which attached to whichever word preceded them. Similarly, the accusative particle came to serve as an indirect case proclitic. In Modern Nevkóst, this sentence becomes:

Riyak venevkóst lasvéth.

/ɾʲijak ʋʲenʲeʋkʰos̠tʰ las̠ʋʲeθ/

ri=yak ve=nevkóst lasvéth

1PL=AV IND=Nevkóst speak

We speak Nevkóst.

If we wanted to change this into the patient voice, it’d be nonsensical, but for illustrative purposes we’ll do so:

Riya venevkóst lasvéth.

/ɾʲija ʋʲenʲeʋkʰos̠tʰ las̠ʋʲeθ/

ri=ya ve=nevkóst lasvéth

1PL=PV IND=Nevkóst speak

Nevkóst speaks us.

The semantic role of both the subject and the object are encoded on the former, though only in the agentive voice does the indirect argument serve as anything other than the agent of the verb. For example, if we devise an instrumental voice, it’d work like so:

Alsvéthriak veri lasvéth.

/als̠ʋʲeθɾʲiak ʋʲeɾʲi las̠ʋʲeθ/

al=svéth=ri=ak ve=ri lasvéth

IV=language=1PL=AV IND=1PL speak

We speak with our language.

This voice is marked via the agentive voice enclitic and its own proclitic. In Middle Nevkóst, this would’ve looked something like this:

Al swetha e rē anga la swetha we rē.

/sweθa e ɾeː aᵑga la sweθa we ɾeː/

al swetha e rē rē anga la swetha

with language of 1PL 1PL AV do speak

We speak with our language.

This one requires a bit more leg-work on the part of the speakers, reanalyzing and analogizing to give rise to this voice. I’m not entirely sure how this would ripple outward into how the language handles other grammatical features, but that’s mostly beyond the purvey of this essay.

To the learner of this hypothetical language, it might appear as though there exist two sets of cases: proclitics which marks the indirect case, as well as a genitive and dative and whatever else the language has taken to marking; and then a set of enclitic case markers which serve to topicalize one argument and indicate the semantic structure of the clause. You could have these “cases” carry traditional TAM as well; for example, if we said that the agentive voice came from an old passive voice, then it might make sense for it to imply a past tense. Thus, the present tense would require the patientive voice while the past tense would require the agentive, somewhat akin to how tense- or aspect-based split-ergativity works in some languages like Georgian or Hindustani.

Some of these voices would be born out of compound voices: for example, the causal voice might come from a causative-agentive, where the subject isn’t the one doing the action but is instead the one causing the action to be done by object. This would, of course, require some reanalysis.

Kának veri lasvéth.

/kʰanak ʋʲeɾʲi las̠ʋʲeθ/

ká=na=k ve=ri lasvéth

3SG=CV=AV IND=1PL speak

We speak because of them.

Similarly, there might be a patientive-causal, derived from a compound causative-patientive, where the subject is the one causing the action to be done unto the object, as in:

Kánaya veri lasvéth.

/kʰanaja ʋʲeɾʲi las̠ʋʲeθ/

ká=na=ja ve=ri lasvéth

3SG=CV=PV IND=1PL speak

We are spoken to because of them.

I don’t really know how useful that idea is, but in any case I figured it’d be fun to test it out.

That about wraps up everything I wanted to explore here. This isn’t really a naturalistic system; at least, I know of no language that both has a symmetrical voice system and marks that system on its subject, but if you’re willing to bend realism a little bit it might be fun to build such a language.

If you wanted to create such a system yourself, it would, in my mind, require some historical explanation for why nominal TAM has arisen. I only have experience with it from Coptic, where the shift in word order and the loss of fusionality seems to have facilitated its development. I wouldn’t be surprised if these are the common conditions under which nominal TAM arises in most languages that feature it, but then again I’m no expert on the topic so this might not be the case. Symmetrical voice doesn’t require as much syntactic manipulation; it seems to arise when speakers reanalyze prepositions, cases, and more common voices.³ I’d expect it to require some analogizing on the part of speakers as well: the way that one puts together the patientive and agentive voices might get applied to things like the causal or instrumental voices even if they previously worked differently.

If you’ve enjoyed this, you might take a look at my other essay on triconsonantal roots (which is significantly longer but equally haphazard). In any case, I intend to write another soon, either about mostly head-marking languages with high degrees of synthesis or about Celtic-inspired languages, so keep your eye out for that if either subject interests you. Anyways, thanks for reading, and happy conlanging.

Arnold M. Zwicky & Geoffrey K. Pullum, “Cliticization vs. Inflection: English N’T,” (Language, Vol. 59, No. 3, Sep. 1983), 502-513.↩︎
That is, both the TAM clitic and the subject clitic attach to the front of the verb.↩︎
Gašper Beguš, “The Origins of the Voice / Focus System in Austronesian,” (Harvard University, 2016).↩︎