Lexical Density and Spoken Word

Spoken word presents a completely different set of patterns and considerations when analyzing a text. One well-documented phenomenon is that the lexical density of spoken language tends to be lower than the written word (Johansson 2008, Ure 1971). We analyzed six interview transcripts to test this observation, three of which are celebrity interviews, and the other three are interviews with political figures. In addition, we looked at the complexity (readability) of each interview transcript. We also thought it would be interesting to look at the most common words of each interview.

Three Celebrities. Three Politicians.

Methodology and Analysis

For every transcript we analyzed, we removed every word which was not actually spoken. These include the names of the people speaking and notations of non-verbal language, laughter, applause, and so forth.
We did not remove the speech of the interviewer as well since this is also natural speech. So interview questions are also included in each transcript.
Our observations are well in line with those of Johansson (2008) and Ure (1971) that lexical density of spoken language tends to be lower than for written texts. The average of the sample was 45.02% which is lower than observed for expository texts, fiction and newspaper articles.
Perhaps not surprisingly, we noticed that both the complexity (readability) and lexical density of the language used was lower among our celebrity interviews. Due to such a small sample size, however, the results are hardly conclusive. The results are summarized below.
Average Lexical Density Average Complexity (Readability)
Celebrities 44.43% 6th Grade
Political Figures 45.98 8th Grade

Concluding Remarks

We note that our choice of interview transcripts was mostly limited to what we could find in an online search. So we cannot say that our sample is representative of modern speech.
In fairness, it should be noted that the differences observed between celebrities and political figures is likely not a reflection of the intelligence of the interviewees, but rather the complexity of subject matter being discussed and the amount of exposition required to adequately address either difficult or not-so-difficult questions.
We also noticed that the celebrity interviews contained many instances of the words "like" and "just." Perhaps, this observations is again not very surprising, but nonetheless interesting as it is a reflection of modern speech and the constant evolution of language.
What kind of results does the reader get when they try something similar?
Links and References

Johansson, V. (2008), Lexical diversity and lexical density in speech and writing: a developmental perspective, Working Papers 53, 61-79.
Ure, J. (1971), Lexical density and register differentiation. In G. Perren and J.L.M. Trim (eds), Applications of Linguistics, London: Cambridge University Press. 443-452.
