Analyze My Writing

Lexical Density

Lexical density is defined as the number of lexical words (or content words) divided by the total number of words ^{[1],[2],[3],[4]}.

Lexical words give a text its meaning and provide information regarding what the text is about. More precisely, lexical words are simply nouns, adjectives, verbs, and adverbs. Nouns tell us the subject, adjectives tell us more about the subject, verbs tell us what they do, and adverbs tell us how they do it.

Other kinds of words such as articles (a, the), prepositions (on, at, in), conjunctions (and, or, but), and so forth are more grammatical in nature and, by themselves, give little or no information about what a text is about. These non-lexical words are also called function words. Auxiliary verbs, such as "to be" (am, are, is, was, were, being), "do" (did, does, doing), "have" (had, has, having) and so forth, are also considered non-lexical as they do not provide additional meaning.

With the above in mind, lexical density is simply the percentage of words in written (or spoken) language which give us information about what is being communicated. With regard to writing, lexical density is simply a measure of how informative a text is.

Simple Example #1

We shall first determine the lexical density of an ideal example. Consider the following sentence:

The quick brown fox jumped swiftly over the lazy dog.

When this website calculates lexical density, it identifies each word as either a lexical word or not:

The quick brown fox jumped swiftly over the lazy dog.

The lexical words (nouns, adjectives, verbs, and adverbs) are colored green.

There are precisely 7 lexical words out of 10 total words. The lexical density of the above passage is therefore 70%.

Simple Example #2

Now consider another example:

She told him that she loved him.

Again, coloring the lexical words in green we have the following:

She told him that she loved him.

The lexical density of the above sentence is 2 lexical words out of 7 total words, for a lexical density of 28.57%.

Comparing Examples #1 and #2

The meaning of the sentence in Example #1 is quite clear. It is not difficult to imagine what happened when "the quick brown fox jumped swiftly over the lazy dog."

On the other hand, it is not so easy to imagine what sentence in Example #2 means. The reader is sure to agree that, due to the use of vague personal pronouns (she and him), the second sentence has multiple interpretations and is, therefore, quite vague.

Notice that lexical density is a reflection of the above observations. The first sentence has a rather high lexical density (70%), whereas, the second sentence has a lexical density which is quite low (28.57%).

The reason that the sentence in Example #1 has a high lexical density is that it explicitly names both the subject (fox) and the object (dog), gives us more information about each one (the fox being quick and brown, and the dog being lazy), and tells us how the subject performed the action of jumping (swiftly). The sentence is packed with information and its high lexical density is a reflection of that.

The reason that the sentence in Example #2 has such low lexical density is that it doesn't do any of the things that the first sentence does: we don't know who the subject (she) and the object (him) really are; we don't know how she told him (loudly? softly? lazily?) or how she loves him (intensely? passionately?); we don't even know if the first "she" and "him" mean the same people as the second "she" and "him." This sentence tells us almost nothing, and its low lexical density is an indicator of that.

By the above examples, we can now see more clearly that lexical density is a measure of how informative a text is.

Lexical Density as a Measure of
How Descriptive a Text Is

We now illustrate the above ideas even further by starting with a sentence which is not very descriptive and progressively changing it to make it more and more informative (in this case, descriptive). Sentence 1 contains a vague personal pronoun. And when we change the pronoun to an actual name, we have more information and the lexical density increases as seen in Sentence 2. Continuing this process, we either add or change a single word at a time to make the sentence progressively more and more descriptive as seen in the table below. The tendency is for lexical density to increase.

Lexical Density by Sentence
	Lexical Words in Green	Lexical Density
1	he loves going to the cinema .	50%

2	john loves going to the cinema .	66.67%

3	john smith loves going to the cinema .	71.43%

4	john smith loves going to the cinema everyday .	75%

5	john smith intensely loves going to the cinema everyday .	77.78%

6	john smith intensely loves going to the huge cinema everyday .	80%

Our lexical density calculator on our homepage will separate your text into individual sentences and calculate the lexical density of each one as in the above table.

Lexical Density as a Measure of
How Meaningful a Text Is

As lexical words give meaning to the language being used, reading only the lexical words in a text can give us a "gist" of what the text is about. Let us consider another example. We shall give only the lexical words from a passage and we ask the reader to try to guess the what the text is about from only these words:

At this moment with a growing economy, shrinking deficits, bustling industry, booming energy production we have risen from recession freer to write our own future than any other nation on Earth . It's now up to us to choose who we want to be over the next 15 years and for decades to come.

Using the button below, you may read the entire passage.

How does the "gist" of the passage compare with the full meaning of the text? Certainly, they do not give every detail, but the lexical words by themselves help us to identify the general idea, while the role of the grammatical, non-lexical words is to help us piece them together to form the whole^[1].

To calculate the lexical density of the above passage, we count 26 lexical words out of 53 total words which gives a lexical density of 26/53, or, stated as a percentage, 49.06%.

Typical Lexical Densities

In the case of written texts we emphasize that lexical density is not a measure of the complexity or readability of a text, but rather, the amount of information the text tries to convey. Thus, expository texts, such as news, journal, technical, and informative articles, tend to have higher lexical densities. A lexically dense text typically scores at around 56% or above.

Note:We should also point out that all of our measurements were obtained by using this website.

Wikipedia: An example of a very lexically dense text is the Wikipedia article summary (excluding tables, captions, citation marks, and peripheral text) on inflation hedges which a lexical density of 64.38%. More generally, however, Wikipedia articles on average (based on randomly sampled articles) tend to score between 55% and 58%.

News Articles An informal, non-random sample of BBC News and New York Times Articles taken by this website yields similar results both publications respectively scoring 56% and 58%.

Fiction: A random sample of an online collection of short fiction gave an average lexical density between 49% and 51%.

General Prose: More general prose tends to have slightly lower average lexical densities between 48.5% and 49.5% This figure was obtained from a random sample of 70 Project Gutenberg e-texts (95% confidence).

Lexical density is generally higher in written language than in spoken language ^[2],[3],[4]. This is not surprising as written text is generally more expository in nature and will naturally contain more information-bearing, lexical words, thereby increasing lexical density. Moreover, spoken language relies upon other non-verbal cues and can be highly context-dependent which reduces the number of lexical words required to communicate an idea. The reader is invited to verify this by analyzing celebrity and political interview transcripts. The interview transcripts we analyzed had an average lexical density of about 45%.

Below is a table which very roughly summarizes the above (for lexical density as calculated by this website):

Text Type	Typical (Average) Lexical Density
Expository Writing (Wikipedia and Newspaper Articles)	between 55% and 58%
Fiction, General Prose	between 48% and 51%
Interview Transcripts (Spoken Language)	near 45%

Using Our Website
to Estimate Lexical Density

Our lexical density calculator on our homepage will separate your text into individual sentences and calculate the lexical density of both the entire text and of each individual sentence.

For example, if you copy and paste the following text from Oscar Wilde's "The Happy Prince" into the text box:

High above the city, on a tall column, stood the statue of the Happy Prince. He was gilded all over with thin leaves of fine gold, for eyes he had two bright sapphires, and a large red ruby glowed on his sword-hilt.

you will get the following output.

Lexical Density for Entire Text

52.38%

Lexical Density by Sentence
	Lexical Words in Green	Lexical Density
1	high above the city on a tall column stood the statue of the happy prince .	53.33%

2	he was gilded all over with thin leaves of fine gold for eyes he had two bright sapphires and a large red ruby glowed on his swordhilt .	51.85%

Assumptions and Limitations

We first note that our calculation of lexical density assumes that a text is written in English. Furthermore, it is assumed that a text is properly punctuated and apostrophes are used correctly.

Secondly, since we use a computer algorithm to make the distinction between different parts of speech, not every word will be properly classified as lexical or non-lexical. And so far, no computer algorithm can do this task perfectly. Thus, any online application which computes lexical density can only offer a close approximation. But in most cases the approximation is generally a good one.

We also point out that how to classify certain words can be a point of debate ^[1],[3]. For example, an aeroplane takes off. Do we classify "take off" as a verb and a separate preposition, or as a single phrasal verb? What is more, do we count the word he's as two words he is (a pronoun and an auxiliary verb) or a single word? The above illustrates some of the ambiguities which can arise and the resulting assumptions which must be made in order to make a calculation. The software used by this site treats contractions as single words and phrasal verbs as two.

Despite such ambiguities, the reader will see that, for the most part, computers can do a decent job of distinguishing lexical words from non-lexical words, and we again encourage the reader to try our lexical density calculator to better understand how this website calculates lexical density.

Applying Lexical Density
to Your Own Writing and Beyond

The following article by David Didau, Black space: improving writing by increasing lexical density, does a fine job of illustrating how to apply the concept of lexical density to improve a writing sample. In particular, by identifying lexical and non-lexical words, a writer can rid their text of "extraneous grammatical garbage" as well as change the grammatical structure so as to increase content-bearing (lexical) words. By doing so, Didau shows with clear examples how a text can become more concise and meaningful.

The reader may also be interested in an informal article which examines how the lexical density of song lyrics^[5] of a well-known band varies from album to album.

Conclusions

We have made our best attempt to explain the concept of lexical density and how this website calculates lexical density. We have also tried to illustrate how lexical density can be interpreted as "how informative a text is" by using examples ranging from high to low lexical density. Moreover, we have attempted to make the reader aware of the limitations and assumptions that must be made in order to estimate lexical density. Finally, we have tried to point the reader to resources which can help them apply the concept of lexical density to analyze and improve their own writing.

If questions remain, we encourage the reader to delve deeper into the topic themselves by consulting the references and links at the bottom of the page. There, the reader will find an abundance of other links and references from which they can begin their own investigation of the topic if they are so inclined. And again, we encourage the reader to experiment with the software themselves in order to gain insights about different forms of writing as well as their own. And, of course, if the reader has any suggestions for how to improve this article please do so here.

Go Back

Links and References

[1] Didau, David (2013), Black space: improving writing by increasing lexical density, from The Learning Spy: Brain Food for the Thinking Teacher

[2] Johansson, V. (2008), Lexical diversity and lexical density in speech and writing: a developmental perspective, Working Papers 53, 61-79.

[3] To, Vinh, Lexical Density and Readability: A Case Study of English Textbooks, presentation given at the University of Tasmania.

[4] Ure, J. (1971), Lexical density and register differentiation. In G. Perren and J.L.M. Trim (eds), Applications of Linguistics, London: Cambridge University Press. 443-452.

[5] Everything in Its Right Place: Visualization and Content Analysis of Radiohead Lyrics

If you like Analyze My Writing and would like to help keep it going, please consider donating.