Writers’ uncertainty in scientific and popular biomedical articles. A comparative analysis of the British Medical Journal and Discover Magazine

Distinguishing certain and uncertain information is of crucial importance both in the scientific field in the strict sense and in the popular scientific domain. In this paper, by adopting an epistemic stance perspective on certainty and uncertainty, and a mixed procedure of analysis, which combines a bottom-up and a top-down approach, we perform a comparative study (both qualitative and quantitative) of the uncertainty linguistic markers (verbs, non-verbs, modal verbs, conditional clauses, uncertain questions, epistemic future) and their scope in three different corpora: a historical corpus of 80 biomedical articles from the British Medical Journal (BMJ) 1840–2007; a corpus of 12 biomedical articles from BMJ 2013, and a contemporary corpus of 12 scientific popular articles from Discover 2013. The variables under observation are time, structure (IMRaD vs no-IMRaD) and genre (scientific vs popular articles). We apply the Generalized Linear Models analysis in order to test whether there are statistically significant differences (1) in the amount of uncertainty among the different corpora, and (2) in the categories of uncertainty markers used by writers. The results of our analysis reveal that (1) in all corpora, the percentages of uncertainty are always much lower than that of certainty; (2) uncertainty progressively diminishes over time in biomedical articles (in conjunction with their structural changes–IMRaD–and to the increase of the BMJ Impact Factor); and (3) uncertainty is slightly higher in scientific popular articles (Discover 2013) as compared to the contemporary corpus of scientific articles (BMJ 2013). Nevertheless, in all corpora, modal verbs are the most used uncertainty markers. These results suggest that not only do scientific writers prefer to communicate their uncertainty with markers of possibility rather than those of subjectivity but also that science journalists prefer using a third-person subject followed by modal verbs rather than a first-person subject followed by mental verbs such as think or believe.

Introduction Distinguishing certain and uncertain information (i.e. factual vs. speculative, hedged, mitigated information) is of crucial importance both in the scientific field in the strict sense and in the popular scientific domain. In the first case, the communication of a piece of information in a certain or uncertain manner determines opposite outcomes (health policies, clinical practice, etc.). On the other hand, the popular scientific communication (magazines, TV, web, etc.) plays a significant role in spreading scientific knowledge, in making people aware, and in assuming subsequent attitudes and behaviours.
Given the importance of this topic in determining practical decision-making, the study of hedging, uncertainty, mitigation and the like in scientific writing has received increasing attention from scholars since the 1990s [1][2][3][4][5][6][7][8][9]. Recently as well, also researchers in the Natural Language Processing community have focused their attention on the detection of certainty and uncertainty markers and their linguistic scope (e.g. [10][11][12][13][14][15][16][17][18]). However, these studies tend. 1. to be small in their number of full-text scientific articles (for example, Bioscope [10], one of the premiere corpora annotated for uncertainty, comprises of only by nine full-text articles); 2. to lack a historical perspective to evaluate how uncertainty has evolved over time; 3. to use a top-down analysis procedure for detecting certainty and uncertainty markers (i.e., a predetermined list of such markers taken from grammars, dictionaries, previous relevant studies, etc.); 4. to lack a linguistic theory concerning the communication of certainty and uncertainty.
As for the study of popular scientific texts, it has often produced controversial results concerning the use of hedging [19][20]. As Varttala [21][22] states, hedging was prevalently investigated in specialist-to-specialist research articles. His study was the first that compared scientific and popular medicine articles for a number of selected lexical hedging devices. He found that hedges often occur in professional medical articles, but are also typical of popular scientific articles dealing with similar topics.
Choi et al. [23] compared scientific and popular articles regarding the GMO debate and suggesting that hedges occur less frequently in scientific discourse than in popular text. Their findings are consistent with those of Schmied [24], who compared scientific and popular medicine articles, observing that more than double the number of lexical hedges occourred in the latter. These results do not support previous findings [25][26][27][28][29] according to which hedges are more abundant in research articles rather than in their popularized versions.

Aims
In a previous study [30], we aimed to fill the four above-mentioned gaps in existing the literature. By adopting an epistemic stance perspective on certainty and uncertainty (see section Theoretical Framework), and a mixed procedure of analysis, which combines a bottom-up and a top-down approach (see Introduction, and Method in [30]), we identified the uncertainty markers and their linguistic scope (i.e., the rate of uncertainty) in 80 scientific articles from the British Medical Journal (available at PubMed Central, http://www.ncbi.nlm.nih.gov/ pmc/journals/3/, last access February 2012), randomly selected from 1840 to 2007 (henceforth, this corpus will be abbreviated as BMJ , in order to test whether the rate of certainty and uncertainty was changed or has remained stable over time. On the basis of the results of this previous study (see next sections), in the present one, we undertake the following: 1. Within BMJ 1840-2007, we compare 22 articles with an IMRaD structure (Introduction, Method, Results, and Discussion) [31][32][33] with 58 articles with a no-IMRaD structure, in order to verify whether the IMRaD variable affects the percentage of certainty and uncertainty (see section Study 1
From this perspective, a piece of information is communicated as certain when, in the here and now of communication, the speaker/writer's commitment to its truth is at the maximum or a high level: (1) There is a relationship between smoking and lung cancer (2) There is certainly a relationship between smoking and lung cancer Declarative sentences in the indicative mood without (example 1) or with (example 2) a certainty marker, like the adverb certainly, are the most used to communicate certainty. In both examples, the authors communicate that they are certain that the piece of information p they are conveying is true, i.e., they are saying that they evaluate p as being true.
Vice-versa, a piece of information is communicated as uncertain when, in the here and now of communication, the speaker/writer's commitment to its truth is at the minimum or a low level: There can be a relationship between smoking and lung cancer (4) The results of our study, tell us that perhaps there is a relationship between smoking and lung cancer In examples 3 and 4, verbs like can, and adverbs like perhaps convey the writers' uncertain stance towards the information p. In the here and now of communication, they say that they do not know whether p is true or false; therefore, they communicate p as uncertain, i.e., they tell the readers that they are not certain about the truth of p.
In written texts such as BMJ articles, uncertainty markers (from here on UMs) can refer either to the author's uncertainty or to somebody else's uncertainty. Both types of uncertainty can refer to the present, past, or future.
As stated above, an essential point in our study on BMJ is the adoption of an epistemic stance perspective: we specifically aimed at identifying the UMs referring to the writer (= the author of the article) in the here and now of his/her communication, i.e., at the time the article was being written.
We excluded from our analysis both the UMs referring to the writer in the past or future (5) It seemed to me that there was a relationship between smoking and lung cancer and the UMs referring to somebody else apart from the author of the article (6) Doctor Adler supposes that there is a relationship between smoking and lung cancer In example (5), in the here and now of communication the author remembers, i.e., knows, that there and then (i.e., in the past) he was uncertain about the relationship between smoking and lung cancer. In other words, in the here and now, the author is communicating as certain a piece of information concerning his past uncertainty: it is a certainty communication of a past uncertainty.
In example (6) the author, in the here and now of communication, is communicating as certain a piece of information referring to the uncertainty of someone different from himself.
As a consequence, our analysis only detected "uncertainty under the first case (the author's uncertainty in the present) and not the other two (the author's uncertainty in the past or future and somebody else's uncertainty in the present, past, or future)" (see Linguistic Background in [30]).
To the best of our knowledge, no previous study has applied such distinction in the detection of UMs in the biomedical field. Applying this distinction or not means to study two different types of issues and leads to different quantitative results. When adopting our differentiated approach, only examples 3 and 4 would be considered as uncertain. On the contrary, when adopting an undifferentiated approach, examples 5 and 6 would also be considered as uncertain. The former approach is specific, the latter generic, i.e., it considers any UM indiscriminately. The choice of one or the other approach differently affects the quantitative results concerning both the UMs and their linguist scope, i.e. "the semantic 'influence' which such words have on neighbouring parts of a sentence" ( [57]: 85). Indeed, in the latter case, the quantitative results would be wider, since the undifferentiated approach considers not only the author's uncertainty in the present but also in the past and future, as well as the present, past and future uncertainty of somebody else mentioned in the article (for instance, Doctor Adler in example 6).

Uncertainty markers
In the corpus BMJ 1840-2007, we identified seven categories of UMs, both lexical and morphosyntactic: verbs, non-verbs, modal verbs in the simple present, modal verbs in the conditional mood, if, uncertain questions, and epistemic future (see Table 1).
For further examples and details concerning each category of UMs, see [30]. Here, we only highlight the new categories of UMs found using our theoretical framework and mixed approach. In fact, while the categories verbs, non-verbs, modal verbs, epistemic future and the sub-category if-clauses within the if-category are usually present in the standard lists of UMs used by the authors mentioned in the Introduction, the sub-categories if-less clauses, as if/as though, if/whether introducing indirect uncertain questions, and the category uncertain questions are new UMs.
As for the uncertain questions category, as described in details in [62], all yes/no questions (polar interrogatives, alternative, tag and declarative questions) are considered uncertain in that they convey a not-knowing-whether epistemic stance of the questioner. They present, explicitly or implicitly, two (or more) possible alternatives that the questioner is uncertain about [56; 62-63]. For instance, if the direct question in Table 1 "Is there a relationship between smoking and any other cause of death?" is transformed into its corresponding indirect form (using the introducing verb to know [64]), we have I do not know whether (or not) there is a relationship between smoking and any other cause of death.
The uncertain questions category includes the direct uncertain questions found in the corpus (42 polar interrogatives, see Table 2). The indirect uncertain questions are included in the if-category, specifically in the sub-category if/whether introducing indirect uncertain questions.
The sub-category if-less clauses includes the conditional constructions having instead of the explicit if, only the subject-verb inversion in the protasis. In the example in Table 1, the initial expression with the subject-verb inversion "Had I regarded. . ." is equivalent to If I had regarded. . .
The sub-category as if/as though includes comparative constructions introduced by as if/as though. In a statement of the form p as if q, a comparison between the main clause p and an ifclause q with understood apodosis is established [65][66][67]: The extracts behaved as if they contained noradrenaline = The extracts behaved as (they would) if they contained noradrenaline = If the extracts contained noradrenaline, they would behave as they did.

Categories of UMS Sub-categories of UMS Examples
Epistemic verbs I believe, we think, I suppose, it seems. . .

Nouns
Doubt, impression. . . In conclusion, as shown by the examples in Table 1, our notion of uncertainty includes, in addition to the narrow sense of uncertainty (I do not know whether p; I'm not certain that p; I'm uncertain about p; etc.), also possibility (as expressed, for example, by the epistemic use of the modal verbs and expressions such as it is possible/probable, etc.) and subjectivity (i.e., the communication of the writers' point of views, such as the expressions in my opinion, according to my view, I think, etc.). "Since these three concepts partially overlap, we prefer using the more generic term "uncertainty", which encompasses them all" ([34]: 58).

Personal attributions
As shown in Table 2, modal verbs in the simple present and in the conditional mood are the most used UMs; non-verbs are more numerous than verbs.

Scope of uncertainty markers and percentage of uncertainty
The linguistic scope of a UM extends either over a whole sentence (whether including coordinate and subordinate clauses or not) or over a part of it. For instance, example 3, in the section Theoretical Framework, was entirely tagged as uncertain (= 10 words). In example 4, instead, only the subordinate clause (". . .perhaps there is a relationship between smoking and lung cancer") was tagged as uncertain (= 10 words). The preceding clause ("The results of our study tell us that") was tagged as certain (= 8 words). Following the same criterion, examples 1 and 2 were also tagged as certain (= 9 and 10 words respectively). This means that, in principle, what was not tagged as uncertain was tagged as certain, since the notions of epistemicity and epistemic stance include only two dimensions: certainty and uncertainty.
As shown in Table 3, the percentage of uncertainty (UMs + their scope) is always much lower than that of certainty both in each period and in the whole corpus. Specifically, in the whole corpus, the uncertainty is 20% and certainty is 80%.

Statistical analysis
As described in details previously [30], in order to test if there were significant differences in the amount of certainty and uncertainty tokens along the four periods, the Generalized Linear Models (GLM) [68] and Wald χ2 tests on GLM [69] were applied.
As shown in Table 3, the percentage of uncertainty in the four periods ranges from 16 to 23% in a non-significant way. The analysis did not reveal any significant variation, even with regards to the amount of certainty. This means that the percentage of certainty (80%) and uncertainty (20%) is the same over the 167-year span. Scientific writers have been using uncertainty in an unaltered manner and always in a smaller percentage as compared to certainty.

The new five studies
As stated in the section Aims, in the following five studies, we aimed to ascertain if there were significant variations in the percentages of certainty and uncertainty along time, between different structures of the articles (IMRaD vs no-IMRaD), and between different genres (scientific vs popular). In other words, we take into consideration only three main variables: time, structure, and genre.
Other possible variables, such as the specific topic of each article (cancer, small-pox, etc.) and the methods used by the writers (experimental, meta-analysis, etc.) fall beyond the aims of the present paper.

Corpus, aims, procedures
The statistical analysis of the temporal variable in BMJ 1840-2007 (see section Statistical Analysis) revealed no significant differences in the percentage of certainty and uncertainty over time. However, the BMJ 1840-2007 corpus consists of scientific articles with different structures. Of the 80 articles, 22 have an IMRaD structure, while 58 do not. Out of the 22 IMRaD articles, 11 have been identified in the third period  and 11 in the fourth . In Study 1, we compare the sub-corpus of 22 IMRaD articles with the sub-corpus of 58 no-IMRaD articles in order to verify if this structural variable can determine significant differences in the rate of uncertainty.

Results
Uncertainty markers. The most used UMs (modal verbs in the simple present and in the conditional mood) are the same in the two sub-corpora as well as in the whole corpus, independently from the structural variable (IMRaD vs. no-IMRaD). Non-verbs are more numerous than verbs (see Table 4).
Scope. The percentage of certainty and uncertainty in the 22 IMRaD articles is 82% and 18% respectively, while in the 58 no-IMRaD articles, it is 80% and 20%, as with the whole corpus.
This means that the uncertainty in IMRaD articles is of 2 percentage points lower than that in no-IMRaD articles. Statistical analysis. In all statistical analyses, the responses were first analysed applying the Generalized Linear Models (GLM), using proportion of uncertainty tokens as the dependent variable. Subsequently, GLM was applied using proportion of UMs as the dependent variable. For this reason, we used GLM with the logit link function and binomial family. Precisely, we perform ANOVA Tables (Type 3) via Wald χ2 tests implemented in the R-software "car" package [69]. Bonferroni corrections were applied to post-hoc comparisons. In the first following analysis, the independent factor is structure in BMJ 1840-2007: the difference between IMRaD structure vs no-IMRad structure is significative (χ 2   There are no significant effects due to the interaction between IMRaD vs no-IMRad structure in BMJ 1840-2007 and UMs. See Fig 2. This means that in both sub-corpora the writers use the same categories of UMs in similar proportion.
Summary of the main results. In IMRaD articles, the uncertainty is 18% and certainty 82%. In no-IMRaD articles, the uncertainty is 20% and certainty 80%. This means that IMRaD articles are less uncertain than no-IMRaD ones and such a difference is statistically significant. The difference concerning UMs in IMRaD and no-IMRaD articles is statistically not significant.

Study 2. Uncertainty in BMJ 2013
Corpus 12 research articles (one for each month of 2013, which was the last year available when this study started) with the IMRaD structure were randomly selected from the "full online Archive" (http://www.bmj.com/archive of the British Medical Journal (BMJ), section "Research"). Total tokens: 55,198; average number of tokens per article: 4,599.

1.
To identify which and how many lexical and morphosyntactic UMs are used by writers in order to communicate their own uncertainty. 2. To identify how much uncertainty (UMs + their scope) is present in each article and in the whole corpus.

Procedures of analysis
The articles were preliminary manually edited and converted from the .pdf extension source into plain text files (.txt). The corpus included titles and main texts. Reference lists, figures, tables, authors' names and affiliations, etc. were excluded to facilitate the data set processing. The qualitative analysis was performed by three independent judges: K coefficient was calculated between two of them and resulted in 0,93 for the UMs identification and 100 for the scope.

Results
Uncertainty markers. Modal verbs in the conditional mood are the most used UMs (Table 5).
Among modal verbs in the conditional mood, could and would are the most used (Table 6). Among non-verbs, likely, potentially, potential, possible and probably are the most used (Table 7).
Among modal verbs in the simple present, may and can are the most used (Table 8).
Among verbs, suggest/s and seem/s are the most used (Table 9).
Within the If category, if/whether is the most used sub-category (Table 10). Scope. The percentage of uncertainty is much lower than that of certainty, both in each article and in the whole corpus (Fig 3 and Table 11).
As shown in Fig 4, of the different sections that form the IMRaD structure (Introduction, Method, Results, and Discussion, to which Box and Abstract have been recently added in BMJ articles), the uncertainty is firstly communicated in the Discussion section (69%), and secondly in the Introduction section (11%).
Summary of the main results. The uncertainty communicated in the BMJ 2013 is about 9% of the total, and it is mainly conveyed through modal verbs, both in the conditional

Corpus, aims, procedures
In Study 3, we compare the 22 IMRaD articles from BMJ 1840-2007 with the 12 IMRaD articles from BMJ 2013 in order to determine if there are significant differences in the percentage of uncertainty. In this study, the variable under observation is temporal.

Results
Uncertainty markers. Table 12 shows that the most used UMs are always modal verbs. In the 12 IMRaD articles from BMJ 2013, modal verbs in the conditional mood occur more than  This means that in both corpora the writers use the same categories of UMs in similar proportion.
Summary of the main results. In IMRaD articles from BMJ 1840-2007, the uncertainty is 18% and certainty 82%. In IMRaD articles from BMJ 2013, uncertainty is 9% and certainty 91%. This means that IMRaD articles from BMJ 2013 are less uncertain than IMRaD articles from BMJ 1840-2007 and this difference is statistically significant. With regard to the UMs, the analysis does not reveal any significant differences.
Total tokens: 36,559; average number of tokens per article 3,046. These popular articles, by definition, have an unconstrained structure, different from the scientific ones (IMRaD).
Aims (identifying UMs + their scope), preliminary procedures and procedures of analysis are the same as those of BMJ 2013. Only the K coefficient was slightly different: 0,89 for the UMs identification and 100 for the scope.

Results
Uncertainty markers. Modal verbs in the conditional mood and in the simple present are the most used UMs (Table 13).  Among modal verbs in the conditional mood, would and could are the most used (Table 14).
Among modal verbs in the simple present, can and may are the most used (Table 15).
Within the if category, if clauses is the most used sub-category (Table 16). Among non-verbs, likely, probably and perhaps are the most used (Table 17). Among verbs, seem/s and suggest/s are the most used (Table 18). As for uncertain questions, they amount to about 7% of the total UMs, as shown in Table 13.
Scope. The percentages of uncertainty are much lower than that of certainty both in each article and in the whole corpus (Fig 7 and Table 19) Summary of the main results. The uncertainty communicated in the Discover 2013 is about 12% of the total and is mainly conveyed through modal verbs, both in the conditional (33.15%) and in the simple present (26.84%). If considered together, they represent about 60% of all UMs.

Corpus, aims, procedures
In Study 5, we compare the 12 BMJ 2013 articles with the 12 Discover articles in order to verify if text genre variable (i.e., scientific vs. popular) could determine significant differences in the percentage of uncertainty. In this study, the variable under observation is genre.

Results
Uncertainty markers. Table 20 shows that the most used UMs are again modal verbs in the conditional mood and in the simple present both in the scientific and popular corpus.  Scope. The percentage of certainty and uncertainty in the 12 articles from BMJ 2013 is 91% and 9% respectively, while in the 12 articles from Discover 2013, it is 88% and 12%. This means that the uncertainty in the former corpus from BMJ 2013 is 3 percentage points lower than that in the latter sub-corpus from Discover 2013.
Statistical analysis. In the following analysis, the independent factor is corpus 2013. The difference between Discover 2013 and BMJ 2013 is significant (χ 2 (1, N = 24) = 194.12, p < 0.001, Cohen's d = 2.85). See Fig 8. There is only one significant effect due to the interaction between corpus discover 2013corpus BMJ 2013 and UMs (non-verbs-BMJ 2013 > non-verbs-Discover 2013; zratio = 3.997, p = 0.004, Cohen's d = 0.81). See Fig 9. Summary of the main results. The percentage of certainty and uncertainty in BMJ 2013 is 91% and 9% respectively, while in Discover 2013, it is 88% and 12%. This means that the uncertainty communicated in Discover 2013 is 3 percentage points more than that in BMJ 2013.
In both corpora, the uncertainty is mainly conveyed through modal verbs, both in the conditional and the simple present. If considered together, they represent about 60% of all UMs. Also, the percentage of verbs in both corpora are almost the same (about 9%).
In Discover 2013, the uncertainty is also communicated through uncertain questions, as opposed to in BMJ 2013, which has no occurrence of such UMs.
Non-verbs decrease notably in Discover 2013; the difference between them and the nonverbs in BMJ 2013 is statistically significant.

Conclusion and discussion
Do time, structure, and genre affect the proportion of certainty and uncertainty in biomedical scientific and popular articles?
Which and how many markers are used in order to communicate uncertainty? These were the main research questions we tried to answer in the present paper. The main novelties concern the following: 1. Theory: the adoption of the epistemic stance perspective (the UMs detected were only those referring to the writer's uncertainty in the here and now of writing the article).

2.
Methodology: the adoption of a mixed procedure for the UMs detection, which combines a bottom-up and a top-down approach. With regard to the first research question, Table 21 shows that in all corpora (three scientific and one popular), the percentage of certainty is always much higher than that of uncertainty, ranging from 80% (BMJ 1840-2007) to 91% (BMJ 2013). Inversely, the uncertainty ranges from 9% (BMJ 2013) to 20% (BMJ 1840-2007).
As for the second research question, in all corpora, both scientific and popular, the most used UMs are modal verbs, both in the simple present and the conditional mood. These results suggest that not only do scientific writers prefer to communicate their uncertainty with markers of possibility rather than with markers of subjectivity [5; 29] but science journalists also prefer using a third-person subject followed by modal verbs, such as may or could, rather than using a first-person subject followed by verbs such as think or believe [72].
In both contexts (scientific and popular), a cautious way (using possibility markers) of communicating a piece of information indeed seems more appropriate than an explicit personal way (using subjectivity markers).

Scientific corpora
Within the same scientific genre, we took time and structure as the main two variables under observation. When only the first variable (time) is studied, the percentages of uncertainty (20%) and certainty (80%) remain stable over 167 years, as shown by the results from the corpus BMJ 1840-2007. However, when the second variable (structure) is introduced, the percentages of uncertainty significantly decrease. In particular, in Study 1, we compared IMRaD and no-IMRaD articles in BMJ 1840-2007. In IMRaD articles, the uncertainty is 18% and certainty 82%. In no-IMRaD articles, the uncertainty is 20% and certainty 80%. This means that IMRaD articles are less uncertain than no-IMRaD ones and this difference is statistically significant.
In Study 3, we compared IMRaD articles from BMJ 1840-2007 and IMRaD articles from BMJ 2013. In the former corpus, the uncertainty is 18% and certainty 82%, while in the latter, uncertainty is 9% and certainty 91%. This means that IMRaD articles from BMJ 2013 are less uncertain than IMRaD articles from BMJ 1840-2007 and this difference is statistically significant.
In other terms, within the three scientific corpora, the percentage of certainty progressively increases (from 80% to 91%) and, conversely, the percentage of uncertainty progressively decreases (from 20% to 9%).  (1) use UMs significantly less than they did in the past and (2) they place UMs primarily in the Discussion and the Introduction sections (see for example [21; 70-71]).
In other words, the results of our studies suggest that the decreasing uncertainty over time is also related to the different structures of the articles: IMRaD vs. no-IMRaD.
The variables affecting the decreasing of uncertainty are of course multiple, and most of them are often out of experimental control.
Broader explanations could be supposed to be mainly related to the following: 1. Medical progress also linked to the development of new technologies.
2. The different content of the IMRaD articles: commonly, high levels of confidence are assigned to Randomized Control Trials [73] and to meta-analyses [74].
3. Peer review system (used in order to ensure reliability). Being a control system [75], it may favour the publication of those scientific articles in which the uncertainty is limited (Ernest Hart, editor of the BMJ, was one of the first editors to implement a peer-review system). Biomedical articles require an imbalance between certainty and uncertainty in favour of the

Scientific and popular corpora
Within the same time period (2013), we considered genre as the main variable under observation. On the basis of the results from Study 2 (Uncertainty in BMJ 2013) and Study 4 (Uncertainty in Discover 2013), in Study 5, we compared the two corpora. The percentage of certainty and uncertainty in BMJ 2013 is 91% and 9% respectively, while in Discover 2013, it is 88% and 12%. This means that the uncertainty communicated in Discover 2013 is 3 percentage points more than that in BMJ 2013. This difference, consistent with that of [23][24], is statistically significant. In other terms, the results of our studies confirm the research hypothesis that genre is, among others, a variable affecting the proportion of certainty and uncertainty.
Broader explanations could be supposed to be mainly related to the following: 1. Different writers for different readers: scientific articles are written by specialist writers (scientists and scholars who usually are the authors of the studies presented in the articles) to specialist readers (scientists and scholars), i.e., for the scientific community (peer-to-peer communication). Popular articles are, on the other hand, written by specialist writers (science journalists) for non-specialist readers [27,29,72].
2. Different aims: scientific articles aim to share and discuss new findings within the scientific community. Such new findings can determine contrary outcomes in terms of health policies, clinical practice, etc. On the other hand, popular articles aim to spread scientific knowledge within the non-experts community in order to make people aware and responsible in assuming subsequent attitudes and behaviours.
3. Different structures of the articles [76][77]: scientific articles have a more rigorous, fixed, structure (IMRaD) in which the author has to present relevant literature on the topic, experimental design, methodology, quantitative results, etc. Popular articles do not have a fixed structure as scientific articles do, since the main purpose of the science journalists is  to summarize, simplify, and compare different studies on a specific topic in order to render them understandable to laypeople. Furthermore, scientific writers often present statistical data, meta-analysis, etc. to support their own assumptions, while science journalists largely use direct or indirect quotations of different researchers without adopting a personal position towards them. In other words, science journalists remain uncertain and neutral about different scientific perspectives, leaving the choice to their readers.
The major limitation of the present paper concerns the size of the two corpora, BMJ 2013 and Discover 2013, since each of them consists of only 12 articles. In the future, we intend to enlarge these two corpora in order to further test our results.