Skip to main content

Is it possible to quantify the number of words in a language?




People like counting. People like to compare things. These two trends are exaggerated even further on the internet. So it's no wonder that there are websites like this, Global Language Monitor, that tout that the number of words in English as some precise number  (1,013,913 and growing at 15/day), as compared to woeful 2nd place finisher Mandarin.

The problem is that this is complete and utter nonsense, crap, b.s., ridonkulosity, bushwa, bubbe-meises.*
* - Keep this in mind as we move on.

The problem with trying to quantify the number of words in a language is that there is no precise way of defining the two most important things in that sentence - words and language.

What is a word?

What, exactly, counts as a word? We have a general sense - dog is a word, bnick is not, but the challenge with really figuring out what counts as a word is highlighted by some of the examples in the sentence above beginning with nonsense.

1) Morphology

Does nonsense count as a word? Or is it the same as sense? What about dog and dogs? Or dog and hot dog? How many words is flame, flames, inflame, inflammable, flammable? Or grandfather, great grandfather, great-greatgrandfather and so on?

English, like almost every other language, has morphology, which is a system of building words from meaningful word parts. Loosely, morphology can be broken down into Inflectional morphology (run -> runs), Derivational morphology, (run -> runner) and Compounding (with varying degrees of coherence, e.g., cab driver, toothpick) with lots of gray area in between.

There is no way of deciding which of these word forms count as a word  in a way that is not completely arbitrary. Lest you think this is a minor factor, these would easily change your answer by close to an order of magnitude as you can see from the flame or grandfather examples. Almost every word is subject to morphology and there is no principled way of deciding when the result should be counted as another word or not.
________________________________

2) Synonyms, homonyms and heteronyms, oh my!

Crap is a verb. Crap is a noun. Crap means a lie and crap means feces. I guess you can count that all as one word, but what about same spelling and a more radically different meaning, e.g., bank (river) and bank ($)? Or how about same spelling, different meaning and different pronunciation, e.g., desert (sand) and desert (leave)? Or if spelling is your guide, what about different spelling of the same meaning, e.g., advisor v. adviser?

Indeed, almost every permutation of same v. different meaning, spelling and pronunciation can be found among (amongst, wink wink) words:

3) Acronyms

Moving on to the next word in our little rant, b.s. Are you counting abbreviations and acronyms in your list and if so, how? B.S. is pretty conventionalized, but certainly not as much as laser, though more so than POTUS, though that depends if you're working in politics or not, not to say what the status is of EKG, an acronym you certainly hear more than the real word itself. As above, whatever deciding line select will be completely arbitrary. The number here probably isn't too high - maybe on the order of 10s of thousands, but it serves to highlight another parallel problem, that of:

4) Neologisms

Did you like the word redonkulosity? I just made it up. Or at least, I thought I just made it up, but it does show up in google w/ 4000 hits. That was after thinking I had sort of created the novel word ridiculosity - spell check says it isn't one - but Merriam Webster says it is.


The fact is that there is no definitive way of deciding whether a new word should count as, well, a word. New entries in the OED or MW are decided by a person, or group of people, according to some general guidelines relating to the frequency of use, place of use and so on. These are not guidelines handed down from on high, as much as we revere the Oxford English Dictionary, but are, again, arbitrary. They even vary from dictionary to dictionary resulting in something like a two-fold difference in the size of different dictionaries.

5) Archaisms

Next up, bushwa, a word I didn't even know until I read this article Keeping It Real on Dictionary Row, where Geoff  Nunberg debunks the charlatans at Global Language Monitor, albeit briefly. That's because the word has been going out of style since about 1950. That's a relatively recent decline as compared to other words, like emmet or pismire, both words for ant, which went out of use hundreds of years ago.

So not only do we not have a concrete way of deciding when to add a word, we similarly have no way of deciding when to remove a word from our list, either. Given that languages are in a constant state of flux, that creates a moving target wherein the exit criteria should be linked to the entrance criteria, which itself is arbitrary. So, again, more arbitrariness.
__________________________________

6) Borrowings

Finally, bubbe-meises, my favorite in the list, which is a word in the English dictionary. It is clearly a borrowing, in this case from Yiddish roughly meaning Old Wives' Tale, but with a bit more of a sense of dismissal.  Words are borrowed into English not with a single leap, but gradually, at different rates for each word depending on pronunciation, frequency, semantics and so on. In counting the words of English, you will have to somehow define yet another cut-off point here when figuring out what to count and what not to count.

7) Specialty Words

And last, but not least, indeed perhaps most, in terms of how it would affect your final number, we have the millions upon millions of words associated with different scientific specializations. Not to say that Critical Theory hasn't come up with its own unique vocabulary, but no one quite compares to Chemists and Entomologists in outdoing everyone else in word creation.

There are 350,000 species of beetles on this planet, each that can be given its own name. And that's just beetles. There are up to 1 billion different species of bacteria. If any of the species in the Mammalia class each get its own word, so to with at least some of the Prokaryot kindgom, no?

A similar problem exists with chemicals and all the permutations and combinations that lead to a near-infinite number of possibilities wherein the only real limits are those of chemistry and not language. How would that work in your word count?

Oxygen, certainly yes. What about Dihydrogen monoxide? Or its synonyms, Dihydrogen Oxide, Hydrogen Hydroxide, Hydronium Hydroxide and Hydric acid? Get to know these chemicals (Facts About Dihydrogen Monoxide), but good luck in figuring out how to count their names.



Clearly, there are some tough (and by tough, I mean completely arbitrary) choices to be made in terms of counting words. Now what about language?

What is a Language?

Oh..and this is a question that bugs linguists! I speak English. You (probably) speak English. We certainly don't speak the same precise language in terms of word knowledge. Which one do we use? There are so many different levels at which a language can be defined that it's impossible to declare a definition of what the limits of any given language is.

First, for a language like English, you have national differences. The language of America will have different words than that spoken in Canada, Australia and the UK, not to mention what people speak in India and Nigeria.
And even within a single country, you have regional dialects that have different lexicons:

 The different ways of saying coke in America. Source: Reddit.

And on down to the individual person, or Idiolect, where each has his own way of speaking English, with different lists of words in his head. If you want to move away from the individual person and try to define the English language that is spoken in the world, it's not clear what that really means. Is that the sum total of all words across all self-reported English speakers? That'd be a mess, wouldn't it! 

You may try to go for some principled definition, e.g., the words in all books published in English, but that, too is problematic for who it excludes and the pride of place you give to literacy, the literary and editors. Thus, as with the definition of word, you're stuck with an arbitrary definition of what a language is.

Summary

Without a clear definition of word and without a clear definition of language you kind of sort of have no practical way of counting anything of anything. And we're not talking about requiring a level of exactitude that is within some reasonable margin of error. We're talking about potentially orders of magnitude difference depending on how you decide. So, yes, by all means, count the number of word of English and say it's 1,019,430, so long as you're comfortable saying that's +/- 1,000,000 words.



You have reached the end of the article. Thank you for taking the time to read the article. Please share if you think the article deserves. Have a blessed day :) 

Comments

Popular posts from this blog

What can you do with a degree in linguistics?

People so often assume that a linguist's job is to learn as many languages as possible, when in actuality it is not anything near that. So, let us put an end to this erroneous assumption once and for all. 
Linguists do not engage in learning languages, linguists engage in studying how language works. And when I say language, I mean human language, an umbrella term that subsumes all languages spoken by humans, including pidgins, creoles, and sign languages. Thanks to linguists the world is a better place now, many daunting problems that existed for centuries have been solved because now we have a better understanding of language and language-related issues. in this article, you will see, in full-length, the contributions of linguists to the modern world. And you are going to see that it's a disgrace to confine a linguist's job to just learning languages. 
Let me just give you some examples before we break things down in more detail. A linguist's job could involve explo…

What linguists know that other people don't.

Being a linguist, or an advanced student of linguistics, is a privilege in a lot of respects. As you analyze language and everyday speech you start to realize that there is an astonishing amount of wonder in this system that we take for granted. Linguists question the obvious, which is language, and got answers that forever changed mankind’s understanding of Language and human nature. They also made us aware of many issues related to language and the attitudes we hold towards them. In this article, you will see what linguists know that is not so evident to, or not accepted by, other people. So let's see what we've got.
We all speak one language. One of the main discoveries of modern linguistics is that it made us aware that all the languages we speak are similar in astonishing respects; they manifest the same pattern, follow the same rules, they are learnt in exactly the same way, and that all the differences are only superficial. So, in a sense we all speak the same language.…

8 books everyone into linguistics should read.

When you want to decide on what to read in language and linguistics, it is never easy to pick a reading list; there is just so many books out there under the label of linguistics, especially that publications in linguistics have been growing like wild fire in the last couple of decades. So with your limited time and the unlimited number of books, it is always wise to make some research beforehand on what exactly you want to read. There is a lot to choose from, and the best book will depend on what you are specifically interested in. This is why we, at The Language Nerds, compiled a list of linguistics books that will entertain the novice and the expert alike. Here are some places to start: 

1.The Language Instinct by Steven Pinker


This is a book for the general science readers, it is very accessible whether you have a background in linguistics or not. It is considered by many as a landmark in linguistics. It is a great introduction and primer to some of the more basic problems and que…