How Much Vocab Do You Actually Need To Speak A Foreign Language?
- Written byMarta Krzeminska
- Read time15 mins
Today we have a guest post by Marta Krzeminska from LinguaLift.
Marta’s a language coach and a language explorer. She’s known around the internet as a speaker of Toki Pona – a language of few words – and in her non-language related pursuits, she experiments with vegan cooking. 🙂
Over to you, Marta!
How many words do you utter in a day?
If you’re a recluse introvert like me, I wouldn’t be surprised if you said twenty. However, if you are a regular speaker of English, you probably use around 15-16 thousand words a day.
This is an aggregate amount, and we repeat ourselves a lot. If many of the words we use daily are the same, then it got me thinking:
What is the minimal number of words we need for effective communication?
The words we speak
There is a widely quoted statistic that claims that, on average, men use seven thousand words a day and women twenty thousand.
You can imagine the sensational headlines of articles that reported this alleged finding!
The study investigating the matter has in fact reported that the distribution of values was very large, and that there was no statistical difference between the sexes.
“On average, women speak 16,215 words per day and men speak 15,669 words per day.”
– Dr. Matthias Mehl
So, we have this myth out of the way.
The words we’ve got
If you investigate the lexicon of any given language you’ll discover some huge numbers.
Now, if you looked at it and thought that in order to gain any communicative powers you have to memorise even 10% of that value, it would be pretty scary. Oxford English Dictionary has 171,476 entries of words in current use; 10% of which is 17,167 different words.
If someone told me that on my first English class, I’d have left never to return.
It would be tempting to conclude that the number of words in a language defines its level of complexity.
But the truth is far from it.
The various online lists that group languages based on their difficulty are created on the basis of the languages’ syntactic, morphological, phonological and syntactic features. Think about it this way, do you know which English word is said to have the highest number of synonyms?
It’s not the word good.
With a mere 380 it can’t even compete with the record holder: drunk – its 2985 synonyms even warranted a whole book! Even if you memorised all of these words, your language would not become more complex, only more varied.
On the other hand each language has homonyms, words with multiple meanings that depend on the context, or the prepositions, prefixes and suffixes they come with.
Famous are the German separable verbs, where the different separable prefix can completely change their meaning.
For instance, the common verb gehen, “to go”, turns into “to shrink (clothes)” when we add the prefix ein – eingehen.
The connections of verbs with prepositions or different affixes create idioms, slang expressions, and phrasal verbs, and are often composed of the most common words. Think of all the English phrasal verbs with the word get: get over, get on, get on with, get out/in, get across, get around…
No wonder we repeat ourselves so much when we speak!
There are some linguists that say that use of slang is a shame as it could lead to the disappearance of more refined and specific vocabulary.
And perhaps in the future we’ll abandon words like alight, discourage, or discard in favour of get off, put off and do away with.
At the moment, using idioms and phrasal verbs is not seen as a sign of linguistic sophistication on paper. We are advised not to use them in official writing, like academic essays or documents.
But wouldn’t you agree these terms actually add complexity to the language?
Look at the sentences below. Even though they are composed of very simple words, they’re far from being straightforward.
One has to be an upper-intermediate English speaker to understand them.
“I can’t put up with this designer.”
“He put together an outfit I can’t put on.
“It puts me down.”
The words we hear
Polyglots like Steven Kaufmann for example put a great emphasis on learning vocabulary and it’s not without reason.
Grammar of some languages may be riddled with exceptions that would take you a lifetime to master.
If your goal in learning is getting to a level of basic holiday-style communication, it’s better to reduce the time spent looking at verb tables and conjugation patterns.
There is no doubt that learning vocabulary will increase your comprehension of both written and spoken texts. Even if the vocabulary you learn is passive, and you have a hard time employing it in sentences yourself, you will be able to retrieve a meaning from memory once you hear a particular word.
A native German speaker, for instance, will be able to understand a sentence “Ich möchte eine grün Hut kaufen” (I would like to buy a green hat) even though it misses the correct case endings and a preposition (the grammatically correct version is “Ich möchte einen grünen Hut kaufen).
If you knew all the case endings, but didn’t know the word for want and hat, it would be impossible to communicate your intention.
Of course, it’s always better to know the correct grammar but if your time is scarce and communication is the main goal, there is really no need to sweat over it too much.
As long as you have the words and the basic sentence structure, you’re fine.
The human mind is very good at picking up patterns, and putting together pieces of the puzzle so a native speaker can usually get the meaning of the sentence using just the vocabulary, and wonky grammar alone.
The words we learn
Now, I don’t mean to say we should stop learning synonyms and expanding our vocabulary – far from it!
Words are the building blocks of language!
However, it is an interesting question to consider how many words would we actually need to accomplish some basic communication tasks.
Language bookstores are full of publications like “500 most common [insert language] verbs”.
But I’m sure you know that learning only 500 dry verbs will not help you speak. How about a book like “1000 most frequently used [insert language] words”? Here, I’d argue, you’d have a bigger chance.
Have you heard of the Pareto principle?
Also known as the 80-20 rule, it’s a maxim that states that 80% of results are always brought by 20% of causes.
What it translates to in economics for example is that 20% of individuals in the USA control 80% of the country’s wealth, in technology 80% of problems come from 20% of bugs.
Similarly in language learning: focusing on the 20% of most effective methods will bring 80% of results.
Hence it is so important to choose a study method that fits your learning style! 🙂
You would think that perhaps what I’m getting at is that it’s enough to learn 20% of words to understand 80% of the language.
That’s not the case.
The breakdown is actually more like 1.75 to 95. Sounds shocking, I know.
Learning 1.75% of vocabulary will allow you to understand 95% of the language.
It’s so surprising you may want to read it again!
This calculation was made by Lingholic on the basis of English, but it already gives an immense amount of hope doesn’t it?
Advance of minimalism
In the recent trend towards zen, calm and meditation, people started seeing the value of minimalism in all areas of life.
It mostly manifests itself in a drive to “declutter” our lives from objects: a multitude of items around us is distracting and can contribute to a feeling of chaos.
Possessing a lot of objects also demands that we devote a considerable amount of time to manage them: ordering, cleaning, organising, or just feeling guilty about using them. The situation is very similar with vocabulary.
There is an ongoing debate in the linguistic community as to what extent our thoughts influence our language, and how much our language impacts our thoughts.
Would a brain with an overflow of words feel more chaotic?
This is one of the presumptions that guided a Canadian translator Sonja Lang in creating a minimalist language Toki Pona.
With only 123 words, Toki Pona allows you to think more clearly with simplified thoughts. At least that’s what Lang says and I can confirm this based on my own experience.
Toki Pona is artificially designed to enable simplification of the language, while maintaining a fair bit of the complexity of expression.
Would we be able to achieve a similar goal with natural languages?
Let’s have a look at some minimalist outlooks on language and vocabulary learning, and check how minimalist we can really get in our approaches.
Don’t get too excited here!
Although we can get pretty far only knowing that much, I don’t mean to suggest that learning mere ten sentences is enough to speak a language. At least not with much variety or creativity.
The idea of learning whole sentences is one that underpins the Duolingo method. When we learn one simple sentence structure, it should be easy for us to create a similar sentence just by substituting one or two words. For example, once you know the sentence:
I want a sandwich.
You can easily say I want a pizza or I want peace of mind.
You learned a structure of: I want + noun.
By observing the small changes that happen in very similar sentences, we are also able to deduce grammatical rules. After hearing a sentence: He wants a sandwich, you will notice that the third person singular verb is different than in the first person – it has an ‘s’ at the end.
This suggests that there are some essential sentence structures that allow us to grasp the basics of grammar.
An approach based on this notion is postulated by the guru of efficiency, Tim Ferriss, who himself has learned a few languages. He’s may not be a linguist, but he learned enough Tagalog in four days to appear on national television, so perhaps he has something to say.
As one of the first steps to deconstructing a language, Tim Ferriss advocates translating 10 sentences into present, past and future. The purpose of this task is to provide an overview of basic grammar, exposing different conjugations, cases, use of negatives, tenses, and modal verbs.
The apple is red.
It is John’s apple.
I give John the apple.
We give him the apple.
He gives it to John.
She gives it to him.
I must give it to him.
I want to give it to her.
I have eaten the apple.
I can’t eat the apple.
Try translating them into the language you’re currently learning.
At a beginner’s level, memorising them will help you remember grammar points you might have struggled with. It can also provide a basic skeleton of phrases you can play with and expand.
The approach has been criticised as not applicable to all languages.
Potential irregularities of the choice of prepositions and their placement can create a confusion, and in many other languages, such as Japanese, a lot more ancillary information would be needed to translate these sentences accurately.
A solution to this would be to ask a native speaker to comment on your translation, and to highlight potential grammatical traps and irregularities.
Linguists are a separate species of language investigators, and one of the common misconceptions about them is that they speak multiple languages.
In fact, many of them don’t speak any foreign language at all.
In their investigations of other languages they come up with various methods of analysis.
One of them is focused on comparing basic vocabulary in order to see relationships between languages, and reconstruct historical changes. In order to do it comprehensively across languages, one must come up with a list of words that exist in every language.
Do you think it’s easy?
You probably came across lists of untranslatable words around the world — they tend to include words referring to very specific concepts, like the Inuit word iktsuarpok meaning “to go outside to check if anyone is coming.”
It would be hard to find equivalents of words like this in many other languages, but how widespread are the words we think are common?
Our conviction of a word’s universality can expose our narrow view of the world’s cultures, and how deeply-rooted we are in our immediate linguistic and cultural environment.
For instance, you may think that every language has a word for ‘we’ but in many languages there is no single word for that concept.
There are many languages that differentiate between inclusive and exclusive ‘we’ and have two separate words for that pronoun: one that includes the addressee (inclusive we), and one that excludes them (exclusive we).
This happens for example in Indonesian, Tagalog, and Maori.
With all these considerations in mind, a linguist Morris Swadesh composed the first list of words to be used for lexical comparisons, words he believed should be shared across different languages and cultures. The list went through a few revisions, and the official version currently comprises of 100 terms.
Despite having more than one author it is still referred to as the Swadesh list.
Try translating the words from the Swadesh list into the language you’re learning.
In tandem with the 10 sentences of Tim Ferriss, we might just be getting somewhere with our minimalist approach!
Attention: Word numbers in headings are growing exponentially!
The idea of simplifying a language to make it easy to comprehend for learners flourished in the 1930’s in the mind of a linguist Charles Ogden.
This prolific writer essentially redesigned English to create a fully working language that only utilises 1000 words. His Basic English (sometimes called Simplish or just Basic), similar to Esperanto, was seen as a means to bring world peace.
That might have been too big of a goal, and we don’t see it used as commonly as Ogden have envisioned.
However, Ogden’s Basic is still used as a starting point in teaching English in some language schools abroad.
Simplish has fewer grammatical rules and exceptions. In a sense it mirrors the way we learn foreign languages as beginners: we don’t go straight to lists of exceptions, or memorise all possible suffixes that can be appended to adjectives.
We focus on general rules, and only later explore the intricacies.
Not knowing exact words for concepts we want to talk about, we also have to resort to paraphrasing, defining more complex ideas in simple words.
Basic’s vocabulary actually comprises of only 850 words.
In addition, Ogden advised each person to learn an extra 150 words from their chosen specific field. If you don’t think this treatment of vocabulary can produce any viable result, I encourage you to consult the book Thing Explainer.
Using only 1000 words, Randall Munroe (creator of the comic xkcd) explains concepts from cells and quarks to the way the plane’s cockpit works. It shouldn’t surprise you if I say that the book is not aimed at children!
After all, how many of us really understand how quarks work?
Simplish can be helpful to facilitate the dissemination of knowledge.
As the majority of research is published in English, it may be hard to access for the researchers around the world who are not completely fluent in this language.
Or for that matter, to beginner students who aren’t yet versed in the jargon used in their academic field.
As much as we can, we try to stick to the simplest possible explanation in our LinguaLift lessons because around half our our students are learning a third language via English, their second language.
Again, if you have doubts whether it works you can test it using an online automatic translation tool like this one.
What if we created equivalent 1000-word lists for every language we learn?
The existence of simplified books, news or newspapers in simplified language are attempts at this. Unfortunately these resources don’t exist for every language we learn, and the set of vocabulary they use is by no means standardised.
Can we use Ogden’s Basic to learn another language?
Hard to tell objectively.
How about a challenge: Translate the 850 words into a language you’re just starting to learn and then go out into the real world (or into the online one…) and try to communicate.
See if people understand you and whether or not you really need extra vocabulary.
If you’d like to see me do it, feel free to write, I’d be happy to be a guinea pig!
One parting word
While I agree with Steve Kaufmann that “vocabulary is much more important than grammar”, it’s critical to remember that it doesn’t mean the more the better.
For the purpose of deconstructing the language you don’t need that many words, and talking about even very complex topics can be accomplished with as little as 850 of them.
As for the one parting word, I’d say: Try!
Try one of the methods listed above and see where it takes you.
Did you enjoy this post?
Share it with someone who speaks a lot, perhaps they would benefit from introducing some minimalism to their communication 🙂
NO ADVERTISING. Links will be automatically flagged for moderation.
”That might have been to big of a goal”
”That might have been TOO big of a goal”
and other typos....
Re: “For instance, the common verb gehen, “to go”, turns into “to shrink (clothes)” when we add the prefix ein – eingehen.”
This is wrong. “eingehen” is generally used to refer to plants dying (Mein Kaktus is eingegangen). You mean “einlaufen” which means that clothes have shrunk in the wash.
Thanks for pointing it out Marieke! Perhaps my dictionary had some outdated info on the word. Is there another prefix we could use with “gehen” to make this example work?
I love your blog site. And I love learning languages, but I hope I might caution your readers on one of your ‘experts.’
While I realize Kaufman considers himself a leading polyglot and a great teacher, I found his instruction difficult to understand.
So, I contacted him, he was equally difficult to understand, and even more challenging to learn from.
HINT: I am learning Russian - my ninth language to actually study.
While Kaufman’s method will work, it does not allow much in the way of real language acquisition (accent, repetition of key words, or a focus on a frequency list.) I would recommend his method ONLY after reaching intermediate level through other resources.
Further, this article was awesome. I wish the numbers of words used on most frequency lists would have been discussed. The 500, 1000, or 2000 most frequent words an be googled for most languages.
But, what does learning 500 words mean for me in most languages? For English, Spanish, and French the word count to reach around 50% of the vocabulary used on a daily basis is 200, 250, 300. You may experience different results, but, those are the numbers I see repeated the most.
For Russian? Multiply by ten. 50% of words used daily are between 1500 and 2500 for Russian. That is not much less than the ENTIRE number of unique words used in the Greek New Testament, or the Hebrew Old Testament. (6,100 and 11,000 rounded).
So, I would recommend learning Russian at the Defense Language Institute. I am too old for that method, so I trudge along.
Could have been an interesting post if you hadn’t used so many words. No pun/joke intended there. This post would have been more effective if it was half as long.
Thanks fro feedback Dan! I suppose that’s just my writing style. The more I write the better it gets, you should’ve seen my university essays! :P
One interesting thing I learnt about the Swadesh list of vocab is in terms of understanding a language’s diversity is that it usually doesn’t help when comparing sign languages. Many of the concepts on the list are signed using what’s considered iconic or gestural signs eg. Eye, head, knee, are often just pointed at or numbers just represented by certain combinations of fingers. In this way when conversing between sign languages you sometimes get a limited amount of vocab for free. But in terms of performing a language survey and learning the amount difference between certain sign languages, the Swadesh list is not a very helpful starting point and they’ve had to make some alterations. Because of iconic or gestural signs it can be easier to have basic conversations across sign languages but gaining any finesse is still a lot of hard work and learning.
Interesting point. I never studied lexicostatistics in sign languages, but you’re right many signs must be indexical and so in comparing sign languages using the same Swadesh list we’d be likely to overestimate the degree of similarity between them.
I did some (admittedly basic) research on this and found a modified Swadesh list used for sign languages. It removes “typically indexic signs” and has only 100 items. I wonder what the term “typically indexic” comprises apart from numbers and body parts... maybe things in the environment like sky and ground?
Your sentence “Ich möchte einen grünen Hut zu kaufen” is not correct. When using “möchte”, one does not use “zu”. Also, the sentence translates as “I would like to buy a green hat”, which has a different meaning from “I want to buy a green hat”.
Hey Kevin! Thanks for your correction, clearly my high-school German has to be reviewed :o I’ll Ask Donovan to implement the change.