Reader Comments

Post a new comment on this article

There are significant problems with this study

Posted by Joe_McVeigh on 26 Mar 2019 at 19:14 GMT

While I appreciate the study of politicians’ use of language and feel that it is an interesting area of research, this article suffers from several fundamental and critical flaws in its methodology. I will summarize these errors here and then sketch them out below. Briefly, the article:
1. Confuses written language with spoken language
2. Uses an ineffectual test for written language on spoken language
3. Does not take into account how transcriptions and punctuation affect the data
4. Cites almost no linguistic sources in a study about language
5. Uses a test developed for English on other languages

First, the article confuses written language with spoken language and seems to assume that there is not a great difference between the two, i.e. that the written versions of the speeches in their corpus are accurate representation of the spoken utterances. Nothing can be further from the truth. Spoken language is fundamentally different than written language and the two should never be confused (see Linell 2005; Bright n.d.). Two of the most relevant ways that speech differs from writing are 1) speech contains fewer words per sentence (although we shouldn’t really talk about speech as having “sentences”), and 2) speech has fewer syllables per word relative to writing. It’s probable that some of the speeches in the authors’ corpus were prewritten and then performed (and so they were more like written language), while others were given more or less spontaneously (and so they were more like spoken language), but the authors do not make any distinctions here and indeed to not raise the subject. In addition, the authors claim that the speeches in their corpus were transcribed verbatim, but this cannot be the case unless the speeches also include false starts, ums and ers, mispronunciations, repetition, and other features common to spoken language. These features are sure to have occurred when the speeches were given, but they were presumably not transcribed (or the F-K scores would be much higher).

Second, Schoonvelde et al. do not realize the limits of the Flesch-Kincaid readability test (hereafter, the F-K test), the formula which is the foundation of their study. The F-K test has been shown to be too simple a measure to accurately represent the complexity of a piece of text (Redish 2000). Part of this has to do with how word length and sentence length do not correlate well with linguistic complexity, but another reason that the F-K test is useless in determining the complexity of a text has to do with its dependence on punctuation.

Third, in a related problem, the authors make no arguments about how punctuation and transcription methods affect their study. The F-K test crucially depends on punctuation as it uses sentence length to measure linguistic complexity. However, spoken language does not have any punctuation. It is obvious that the punctuation was inserted by transcribers, but no discussion is given about their motivations for why they inserted punctuation symbols where they did. We can assume the transcribers followed the norms of standard English orthography, but norms in standard English are not set in stone. The transcribers – presumably a disparate group of people who were working independently – could just have easily used more commas and fewer periods, thereby making the speeches seem more complex. Or they could have used fewer commas and more periods to make the speeches seem less complex, when it is judged using the F-K test. Punctuation in written language is largely arbitrary (at least for joining clauses) and yet the authors of the study make no mention of how critical this is for their research. (For more on how altering punctuation can drastically change the F-K score of a text while still adhering to standard written English norms, see Liberman 2016)

Fourth, the authors cite almost no linguistic sources which use the F-K test to measure complexity. They cite two sources which both appeared in linguistic journals and used the F-K test, but neither of these sources base their research solely on the F-K test. This is because the field of linguistics has a higher threshold for analyzing textual complexity. Although this article did not appear in a linguistic journal, it seems that the same threshold should apply, especially since the F-K test has been shown to be a poor predictor of linguistic complexity.

Finally, the F-K test was developed over 50 years ago to measure complexity in written English, but the authors applied it to other languages. This is highly problematic and they make no arguments to why this should be allowed. Indeed, the F-K test has been shown to give similar ratings to written German and gibberish (see Liberman 2014). In addition, the authors do not discuss why decades-old research on readability levels of children’s books in English (the basis for the F-K test) should be used to rate the complexity of modern day spoken Spanish, French, Dutch, German, etc. (Redish 2000). Even the grammars and writing systems of related languages differ enough that expecting a test developed for one of them to work on all of them needs to come with extraordinary evidence.

I would like to again stress that I think the study of the language used by politicians is an interesting topic of research and I encourage political scientists to engage in the linguistic literature to support their studies. I also feel that the errors in this article were not made intentionally, but nevertheless, they still need to be addressed. I realize that the flaws outlined above also bear on other studies using the F-K test to measure linguistic complexity. There is nothing I can do about that. There are ways to study linguistic complexity, but unfortunately for those studies and the one under discussion here, the F-K test is not a reliable method for such research. Due to the methodological flaws in this study, the conclusions drawn from the research cannot be supported.

Joe McVeigh
University of Jyväskylä
Department of Language and Communication Studies

Bright, William. n.d. What’s the difference between speech and writing? Linguistic Society of America. https://www.linguisticsoc... (Accessed March 20, 2019)

Linell, Per. 2005. The Written Language Bias in Linguistics: Its nature, origins and transformations. Oxon: Routledge.

Liberman, Mark. 2016. The shape of things to come? Language Log. http://languagelog.ldc.up...

Liberman, Mark. 2014. Another dumb Flesch-Kincaid exercise. Language Log. http://languagelog.ldc.up...

Redish, Janice. 2000. Readability formulas have even more limitations than Klare discusses. ACM Journal of Computer Documentation 24(3): 132-137. doi:10.1145/344599.344637

No competing interests declared.

RE: There are significant problems with this study

mschoonvelde replied to Joe_McVeigh on 28 Mar 2019 at 16:56 GMT

In the above comment on our paper Joe McVeigh raises a number of concerns about our use of a Flesch Kincaid readability to score the complexity of language used by politicians. We appreciate these comments. However, we disagree with much of what’s said and this comment details our responses point-by-point.

We hope that this discussion could push forward a constructive dialogue between political scientists (like us) and linguists interested in assessing the linguistic styles of politicians because there are interesting questions to be explored. It would be great to see if our results can be replicated using other measures of linguistic complexity. We will do our bit to reach out.

With best wishes,

Martijn Schoonvelde
Anna Brosius
Gijs Schumacher
Bert N. Bakker

1. Confuses written language with spoken language and 3. Does not take into account how transcriptions and punctuation affect the data

We use three datasets, two of those only contain pre-written speeches (the prime minister speeches that are part of EUSpeech, as well as the party congress speeches), one of those consists of transcribed speeches which may have been pre-written (Parlspeech data). McVeigh seems to suggest that the use of interpunction in these transcriptions is pretty much arbitrary. We don’t agree with this. If. We. Wrote. A. Sentence. Like. This. People. Would. Think. We. Are. Nuts. Of course there can be some room for maneuver for the transcribers in the use of interpunction (which would have some impact on F-K scores) but we believe it is very unlikely that this produces systematic biases in the F-K score for an entire speech. As such we don’t think this is as much as a problem as McVeigh suggests. Moreover, we were not unaware of the differences between these datasets, rather we see it as a strength that the systematic difference between liberal and conservative politicians replicates across datasets with different sorts of speeches (both in terms of who, where and how they were given).

2. Uses an ineffectual test for written language on spoken language

McVeigh cites an article by Janish Radish (2000) that is critical of readability measures and proposes usability testing with typical readers instead. We of course agree that every reader is different, and that linguistic complexity as measured by the F-K test is not the only factor determining a reader’s comprehension of a text (nor do we claim this in the paper), but we do argue that is an important component that correlates with reading comprehension (e.g. Benoit, Munger and Spirling 2019 report a high correlation between F-K and complexity reports by readers). The following quote by Radish (p. 136) is in line with our thinking on this: “If you do use a readability formula and your document gets a very poor score, that probably indicates that people will have problems with it. It probably has overly long sentences and long or unfamiliar words—and documents with overly long sentences often have many other problems. The poor score is a red flag that the document was probably developed without a focus on users, without any consideration of who the users are, what they know, how it should be organized for them, etc., etc. But the poor score doesn’t tell you what else is wrong with the document (besides long sentences and long or unfamiliar words); nor does it give you any hints on how to fix it.”

At this point we also want to emphasize in the paper we do not interpret the scores on the F-K test as absolute scores on a grade scale. Rather we interpret differences in average scores between speakers as more or less complex. In our view, Radish supports using F-K for the purposes we have in the paper, however, we also realize that for different research goals these measures might not be suitable.

4. Cites almost no linguistic sources in a study about language

Our paper addresses the discussion in political science and psychology regarding the relationship between personality, ideology and linguistic habits. We mostly try to contribute to the literature in these fields, which is reflected in our bibliography. For the construction of our dependent variable we acknowledge that citing more sources reflecting the discussion on F-K would have been appropriate.

5. Uses a test developed for English on other languages

McVeigh seems to suggest that we make comparisons of F-K scores across languages. We don’t do that. We only make within-country comparisons and no cross-country comparisons. That is, we compare politicians within the UK with each other, we compare politicians within the Netherlands with each other; and we compare politicians within Spain with each other. The argument McVeigh makes with regards to German and gibberish is a straw man: a thermometer meter would not be able to distinguish between a human being and soup that is 37 degrees warm. That doesn’t mean that a thermometer is a completely ineffective instrument for measuring a person’s temperature.

Furtermore, our paper does discuss the applicability of F-K across languages. We report that the correlation between, for example, a German-specific measure and F-K for the German speeches in our corpus is very high at r=0.99. To maintain comparability we therefore opted to use F-K in all within-country analyses.

In sum, by no means do we think that F-K is a perfect measure of language complexity but we do think that it is a useful tool for the purposes we had with this paper. We hope that these comments clarify our reasoning for the analyses in the paper.

No competing interests declared.

RE: RE: There are significant problems with this study

Joe_McVeigh replied to mschoonvelde on 04 Apr 2019 at 04:03 GMT

Schoonvelde et al. have attempted to respond to the problems in their study without accurately addressing the flaws I pointed out in my original comment. In this comment I will point out how their replies to my comments show again that they are not understanding the methodological problems in their paper and the linguistic knowledge needed to conduct a study of complexity in language. While I agree that it would be nice to see the results of their paper replicated using other measures of linguistic complexity, I’m afraid such replication will not come from their methodology. If their results are replicated, it will be a matter of coincidence rather than confirmation.

Schoonvelde et al. claim that interpunction is not arbitrary. They also claim that they “believe” that it is “very unlikely” that punctuation “produces systematic biases in the F-K score for an entire speech”. And finally, they claim that they “don’t think this is as much as a problem” as I claim it to be. These claims, therefore, stand against decades of linguistic study. As Carter et al. (2001) note “conventions for the transcription of speech vary considerably”. Likewise, Brazil (1995) says that “We think we know where a sentence begins and ends; it has a special kind of completeness that makes it seem to be in some way separable, both from any other speech that may precede or follow it and from the unique background of speaker-hearer understanding against which every sample of purposeful speech operates. To a very large extent, the sentence which has no special context, either of a linguistic or of a ‘real world’ kind, has been taken to be the proper object of the grammarian’s attentions. But it is not characteristic of language users to produce such free-standing objects; they produce pieces of language in the performance of some communicative activity which is meaningful in the situation they presently occupy.”

Schoonvelde et al. note that 2/3 of their data comes from written speeches. Do they know if the speeches were performed exactly as they were written? Their article makes claims about how politicians communicate, but what they are really studying with this section of their dataset is how political speechwriters write. Carter et al. (2001) note that “just because speeches go through a process of careful planning and construction at the pre-delivery stage does not mean that, in delivery, they will seem consciously crafted and calculating. Indeed, the best speeches will appear to express, as if spontaneously, exactly what the speaker intends, in an honest and heartfelt way.” The Schoonvelde et al. study seems to suggest that the authors are using “communication” as a synonym for “speech”. For the other 1/3 of the dataset, the authors say it consists of speeches which may have been pre-written. If we assume that the speeches were given spontaneously, then it would make this section of their dataset representative of what they claim to be studying: political speech (“communication”). However, the problem of punctuation is still there. How do we know that the transcribers were systematic in their decisions to join and break clauses? The method used by Schoonvelde et al. (the Flesch-Kincaid test) crucially depends on punctuation, but it is inherently obvious that the decision of where to break clauses separated by coordinating conjunctions is a matter of choice, not a matter of grammar (in English). How many of the sentences in the conservatives’ speeches start with And or But? How many of the sentences in the liberal speeches start with Because or So? Schoonvelde et al. do not tell us this crucial piece of evidence, and indeed such evidence would say more about transcribers’ choices about punctuation than it would about politicians’ speech.
Although Schoonvelde et al. “don’t think” this is a problem, the study of language is not built on thoughts. The authors claim that the replication of a difference between liberal and conservative political speeches reveals the strength of their methodology. In fact, it may be a convenient coincidence. They offer the punctuation of the Simpons’s Comic Book Guy as an answer to the point that where to place punctuation is arbitrary. This example both supports my case (they placed a period after every word and the message was still understood, although the linguistic complexity and implicatures of such style of punctuating a clause would be lost on the F-K test) and it suggests they did not consider all the linguistic research I presented (they engaged in only one of the sources I cited).

The authors then make a reply to my proof that their methodology is faulty for using a test for written language on spoken language is faulty (point 2 of theirs above). This response shows that the authors still do not understand the differences between written and spoken language. They cite Redish (2000), but fail to notice that Redish’s article describes only written language, and they again miss the fact that they are applying a test based on written language to spoken language. The authors point to other studies which claim that F-K scores correlate with reading comprehension, but they say nothing about spoken language. They provide no citations for studies which show that F-K scores correlate well with listening comprehension, even though people listen to speeches, rather than read them. As Carter et al. (2001: 243) point out “It’s important to realise that spoken discourse should not be judged using the rules of written English: terms such as ‘word’, ‘sentence’ and ‘paragraph’ above, all come from the study of writing. The written form is not an appropriate medium for oral language”. The failure to understand this fundamental difference between spoken and written language threatens to make any constructive dialogue impossible. Indeed the authors do not engage with any of the sources I cited which point out the basic differences between speech and writing. This is a problem with the Schoonvelde et al. study that cannot be resolved.

Schoonvelde et al. then reply to my point that they cite almost no linguistic sources (point 4 of theirs above) by claiming that their paper is about political science and psychology and that they “try to contribute to the literature in these fields.” They do, however, “acknowledge that citing more sources reflecting the discussion on F-K would have been appropriate.” I therefore encourage the authors to find sources from the last 30 years by linguists or language scholars who agree with them that the F-K test is a reliable way to measure the complexity of language.

After this, Schoonvelde et al. claim that my comment “seems to suggest that we make comparisons of F-K scores across languages” (point 5 above). I did not suggest that and I’m not sure how they found that suggestion in what I wrote. Instead, I claimed that the authors use the F-K test on other languages even though it was (1) developed on English writing from half a century ago and (2) developed to measure English writing. I also point out that the F-K test gives similar scores to German and nonsense. The analogy to a thermometer suggests that any test developed to measure one language can be used to measure other languages. The use of the analogy suggests that Schoonvelde et al. do not understand that languages differ on lexical and orthographical levels. If we were to follow their reasoning in the field of linguistics, we would use the Penn or CLAWS part-of-speech taggers (which were developed on English) for French and German and Dutch, simply because the former are languages and English is also a language. These part-of-speech taggers would give some kind of output when applied to French, German and Dutch, but like the F-K test they would be completely ineffective at giving a useful output. Analogies to soup are not able solve the major problem of applying the F-K test to languages other than English.

Following this, the authors claim that a “very high” correlation the F-K test and a German-specific measure motivated them to apply the F-K test across languages in order to “maintain comparability.” This is curious. The German-specific test that correlated so highly with the F-K test is the Flesch-Douma test. The Flesch-Douma test was developed in the 1960s and is adapted from the Flesch-Kincaid test. So of course it will give similar scores. But this also means that the Flesch-Douma test suffers from similar problems as the Flesch-Kincaid test. Or are we to presume that languages other than English do not differ in their spoken and written varieties? Or that languages other than English are not able to start sentences with coordinating conjunctions? The same goes for the German Lesbarkeitsindex test, which they also cite in their article. And although they have a high correlation with an English-specific test and a German-specific test, the authors do not state what motivated them to apply the F-K test to other languages besides English and German. These are further failures in their methodology.

Finally, the authors sum up their reply to me by stating that they “think” that the F-K test is “a useful tool”. But thinking the F-K test is a useful tool is not enough to make it a useful tool, especially not for the claims they make in their paper. As I have pointed out, linguists and language scholars have proven that the F-K test is not a valid way to measure linguistic complexity. It’s nice that political scientists want to analyze language, and I welcome them into the field, but they need to get up to speed with the methods used in linguistics, even if they are speaking to other political scientists. I imagine that the same would be expected of linguists doing studies in political science.

Joe McVeigh


Brazil, David. 1995. A grammar of speech. Oxford: OUP.

Carter, Ronald, Angela Goddard, Maggie Bowring, Danuta Reah and Keith Sanger. 2001. Working with texts: A core introduction to language analysis (2nd ed.). Oxon: Routledge.

No competing interests declared.

RE: RE: RE: There are significant problems with this study

mschoonvelde replied to Joe_McVeigh on 05 Apr 2019 at 09:57 GMT

Thanks to Joe McVeigh for taking the time to explain his arguments in more detail. Let us reply again to his comments:

The datasets: We expect the linguistic differences between liberals and conservatives to emerge in different forms of communication, e.g. written speech, actual speech, but also on social media. The fact that there are differences between our datasets in the medium of the speech is, again, in our view a strength not a weakness. Of course, the written speeches may be different from the actual speeches. We discuss this speechwriter problem elsewhere: and find there is no conclusive evidence of massive speechwriter bias. Still, if this were the case, then our findings can still be explained by deliberate political strategy, as we discuss in the conclusion. Finally, we do not quite follow the point about the definition of communication or speech that we supposedly make implicitly.

Punctuation: We think it is highly unlikely that transcribers’ punctuation policies differ between liberal and conservative speakers. We already acknowledged that there are certainly punctuation errors in the transcribed speeches. In the aggregate, however, we believe this is random noise which washes out with having analyzed a large corpus of speeches. We disagree with McVeigh that there is no place in science for beliefs. Every scientific study is built on beliefs, although we typically call them assumptions, or even hypotheses.

Difference between spoken and written language: We are interested in how politicians construct their communication to their audience. In the article we emphasize two possibilities: one emphasizing personality, the other strategy. We are more interested in the words that politicians choose, and less so in how these words are received (at least we are not interested in this here).

In sum, we again welcome McVeigh's points and we have laid out our responses. We wish to thank McVeigh for these comments. We will certainly take some of the things we learned from this exchange into future work

Martijn Schoonvelde
Anna Brosius
Gijs Schumacher
Bert Bakker

No competing interests declared.

RE: RE: RE: RE: There are significant problems with this study

Joe_McVeigh replied to mschoonvelde on 09 Apr 2019 at 18:09 GMT

I would like to thank the authors for once again replying. I’m afraid that we are no further along in our understanding of the problems with the study. In their reply to my most recent comment, Schoonvelde et al. make a note about a “speechwriter problem” which they discuss elsewhere. But I didn’t raise the problem of speechwriter bias. Instead, I asked whether the speeches were performed exactly as they are written in the corpus. If they were not, then the study by Schoonvelde et al. does not research politicians’ “speech complexity” (as is claimed in the article), but rather how political speeches are written. The Schoonvelde et al. study then makes overarching claims about how liberal and conservative politicians speak. But as Milroy & Milroy (1991: 60) point out, there are “very considerable differences that exist between modes of sentence construction in speech and writing” and that “devices available to speakers for organizing their linguistic presentation are quite different from those available to writers” (p. 142; italics theirs). In that way Schoonvelde et al. confuse “communication” and “speech”. This confusion has a bearing on the authors’ whole dataset too. Crystal (2003: 181) says that “[the differences in speech and writing] are much greater than people usually think. The contrast is greatest when written texts are compared with informal conversation; but even in fairly formal and prepared speech setting, such as a teacher addressing a class, the structure of the language that is spoken bears very little similarity to that found in writing.”

Schoonvelde et al. claim to be studying political speeches and politicians’ speech, but they are really studying one restricted form of communication: the punctuation habits of political speechwriters. The authors claim in their reply to my comment that “the written speeches may be different from the actual speeches.” I will have to assume that by “actual speeches,” they mean spontaneous speeches. But their claim that the written (i.e. pre-planned) speeches and spontaneous speeches “may be different” shows that the authors do not understand the differences between speech and writing. It’s not that the written and spontaneous speeches may be different: they are different. Milroy & Milroy are again useful here to understand this difference. They note (1991: 64) that “although spoken language is diverse in its forms and functions, the norms of written grammar, spelling and vocabulary are much more uniform” and they emphasize that “the forms and functions of spoken language are very largely different from those of writing.” Crystal (2003: 181) agrees with this notion and says “the differences of structure and use between spoken and written language are inevitable, because they are the product of radically different kinds of communicative situation.” This uniformity goes beyond just spelling and vocabulary and into the level of sentence-construction (Milroy & Milroy 1991: 67). What this means is that written and spoken forms of English are fundamentally different, both in how “words” are represented and how “sentences” are (or can be) formed.

In the paragraph marked “Punctuation”, Schoonvelde et al. claim to disagree with me that “there is no place in science for beliefs”. I would encourage the authors to reread my comment so that they can notice it does not include anything about the place in science for beliefs. However, they are correct that hypotheses and assumptions are kinds of scientific beliefs. Hypotheses and assumptions, however, come before an analysis. When hypotheses are confirmed by evidence, they are called facts. It is therefore good that Schoonvelde et al. raised the idea of hypotheses in science. Their claim that they “think it is highly unlikely that transcribers’ punctuation policies differ between liberal and conservative speakers” is a hypothesis, not a fact that is backed up by research or evidence.

To answer the larger point made in the authors’ paragraph marked “Punctuation”, however, it is important to again state that the authors have misunderstood my criticism. I did not point to “punctuation errors”, which of course will be washed out when there is a large enough amount of data. I did, however, point to the fact that transcribers can place periods or commas in front of conjunctions and still follow the norms of standard English. But their choice of which punctuation mark to use has the potential to greatly affect the results of the Schoonvelde et al. study because the measurement they use (the Flesch-Kincaid test) so crucially depends on punctuation. In addition to this, we know from linguistic research that “formal (and planned) speech events are in fact characterized by an absence of reliance upon immediate context for their interpretation, and by conjunctions such as because, therefore, since which express explicitly temporal and causal relationships between clauses” (Milroy & Milroy 1991: 147). In addition, Crystal (2003: 181) says that “writing avoids words where the meaning relies on the situation (deictic expressions, such as this one, over there).” This means that 2/3 of the dataset in Schoonvelde et al. (the part of the data which the authors claim is made up of pre-written speeches) will include more opportunities for periods to be placed in front of conjunctions (as well as fewer opportunities to use one-syllable deictic words in English). In order to be sure that the conservative or liberal political speechwriters do not affect the results of the study by using periods instead of commas (or no punctuation marks) before conjunctions (and vice versa), the authors should have probably looked at how punctuation was used before conjunctions. Note that I am talking here about the pre-written speeches in the data, i.e. the data that was not written by transcribers. In their response to my criticism above, the authors have mistaken which part of their dataset is more likely to contain this problem, as they refer to how they “think it is highly unlikely that transcribers’ punctuation policies differ between liberal and conservative speakers.” The 1/3 of the dataset that is made up of transcribed speeches will also have this problem, but probably to a lesser extent than the prewritten speeches because of the fundamental differences between spontaneous spoken language and pre-planned written language (Milroy & Milroy 1991: 142; Crystal 2003: 181; Rowe & Devine 2006: 293).

In their most recent reply to me, Schoonvelde et al. claim that they “are more interested in the words that politicians choose, and less so in how these words are received”. This comment shows another lack of understanding of the data. Schoonvelde et al. are clearly studying how the words are received because at least 1/3 of the data (according to the authors) was transcribed by someone other than the speaker, who was the person that “chose” the words. While we cannot talk about spoken language having sentences, the transcribers of this section of the data have clearly made choices about where to place the punctuation. In other words, their reception of where each “sentence” ends has motivated their placement of punctuation and their placement of punctuation has a crucial bearing on the data. So again the question is about the transcribers’ actions: did they place periods in between clauses at a higher rate for conservative or liberal politicians? The speakers did not use any punctuation, so the essential aspect of the dataset falls on the transcribers. This means that what Schoonvelde et al. are studying (with this subsection of their dataset) is not whether conservative or liberal politicians use complex language, nor really which words the politicians choose, but whether the transcribers make the language seem more complex or simple by their use of punctuation. As Schiffrin (1994: 25) notes, intonation is what breaks speech up into chunks, but the breaks in intonation do not fit with punctuation and they do not “always correspond to syntactic boundaries,” which are marked by punctuation in writing. Instead, the use of punctuation in transcriptions of speech “forces us to think of such chunks as sentences, rather than as providing an accurate representation of how speakers themselves produce language” (Schiffrin 1994: 25). And this is why the F-K test is not sufficient for studying complexity in language. The liberal politicians may use more complex language, but the F-K test can’t tell us that.

The authors do not address my criticism of their use of a test for written language on spoken language. As I pointed out in my previous comment, Carter et al. (2001: 243) says that “spoken discourse should not be judged using the rules of written English: terms such as ‘word’, ‘sentence’ and ‘paragraph’ above, all come from the study of writing. The written form is not an appropriate medium for oral language”. Crystal (2003: 181) illustrates how writing and speech differ by saying “units of discourse [in writing], such as sentences and paragraphs, are clearly identified through layout and punctuation. By contrast, the spontaneity and rapidity of speech minimizes the chance of complex preplanning, and promotes features that assist speakers to ‘think standing up’ – looser construction, repetition, rephrasing, filler phrases and the use of intonation and pause to divide utterances into manageable chunks.” These linguistic scholars are not alone. In this thread, I have already listed Brazil (1995), Bright (n.d.), Carter (2001), Linell (2005), Liberman (2014, 2016), and Milroy & Milroy (1991). There is also van Kemenade and Los (2013: 218), who say that “one of the hallmarks of oral versus written styles is the way clauses are connected. The development of a written style tends to involve a tighter syntactic organization: instead of the loosely organized string of main clauses (“parataxis”) characteristic of oral styles, written styles tend to have complex sentences, with embedding (“hypotaxis”) of subclauses that function as subjects, objects, or adverbials of a higher clause.” To this we could add English and Marr (2015: 141), who show that prepared writing uses “frequent clause embedding” rather than chaining clauses in an additive way, which is what spoken language does. They also point out “another feature which characterizes writtenness, particularly in formal writing […], is nominalization. […] It is this that leads to the lexical density that Halliday (1989) refers to and partly to the organizational complexity. Nominalization avoids the need for lots of simple clauses and fewer verbs and promotes instead the embeddedness and indirectness of clause complexity.” I encouraged Schoonvelde et al. to find linguistic studies which agree with their methodology, but they have yet to cite any. Linguists, however, are in agreement, which means that if Schoonvelde et al.’s methodology of studying spoken language with a written language test is permissible, then decades of linguistic research is wrong.

Another one of my criticisms which Schoonvelde et al. did not address was about how they took a test which was developed for English and used it to study other languages. Roach (1998: 151) has noted that comparing different languages involves a “serious problem” because “some languages (e.g. German, Hungarian) have some very long words, while others (e.g. Chinese) have very few words of more than one or two syllables.” But Schoonvelde et al. used the F-K test, which was developed for one language (English), and applied it to other languages (Spanish, Danish, Dutch and Swedish). There is no discussion about why this should be permitted beyond the claim that the F-K test correlated highly with some other German readability tests, which were themselves adapted from the F-K test.

The authors are welcome to reply to this comment. The fundamental problems that I brought up in my original comment remain. But I hope that the authors do indeed take what they have learned from this exchange into their future work.

Joe McVeigh

Crystal, David. 2003. The Cambridge Encyclopedia of Language (2nd ed.). Cambridge: CUP.

English, Fiona and Tim Marr. 2015. Why do Linguistics? Reflective Linguistics and the Study of Language. London: Bloomsbury.

Milroy, James and Lesley Milroy. 1991 (2nd ed.). Authority in Language: Investigating Language Prescription & Standardization. London: Routledge.

Roach, Peter. 1998. “Some languages are spoken more quickly than others.” In Language Myths, ed. by Laurie Bauer and Peter Trudgill, 150-158.

Rowe, Bruce and Diane P. Devine. 2006. A Concise Introduction to Linguistics. London: Pearson.

Schiffrin, Deborah. 1994. Approaches to Discourse. Oxford: Blackwell.

van Kemenade, Ans and Bettelou Los. 2013. “Using historical texts”. In Research Methods in Linguistics ed. by Robert J Podesva and Devyani Sharma, 216-232.

No competing interests declared.