Trends in Citations to Books on Epidemiological and Statistical Methods in the Biomedical Literature

Background There are no analyses of citations to books on epidemiological and statistical methods in the biomedical literature. Such analyses may shed light on how concepts and methods changed while biomedical research evolved. Our aim was to analyze the number and time trends of citations received from biomedical articles by books on epidemiological and statistical methods, and related disciplines. Methods and Findings The data source was the Web of Science. The study books were published between 1957 and 2010. The first year of publication of the citing articles was 1945. We identified 125 books that received at least 25 citations. Books first published in 1980–1989 had the highest total and median number of citations per year. Nine of the 10 most cited texts focused on statistical methods. Hosmer & Lemeshow's Applied logistic regression received the highest number of citations and highest average annual rate. It was followed by books by Fleiss, Armitage, et al., Rothman, et al., and Kalbfleisch and Prentice. Fifth in citations per year was Sackett, et al., Evidence-based medicine. The rise of multivariate methods, clinical epidemiology, or nutritional epidemiology was reflected in the citation trends. Educational textbooks, practice-oriented books, books on epidemiological substantive knowledge, and on theory and health policies were much less cited. None of the 25 top-cited books had the theoretical or sociopolitical scope of works by Cochrane, McKeown, Rose, or Morris. Conclusions Books were mainly cited to reference methods. Books first published in the 1980s continue to be most influential. Older books on theory and policies were rooted in societal and general medical concerns, while the most modern books are almost purely on methods.


Introduction
If one considers the book as the macro unit of thought and the periodical article the micro unit of thought, then… Eugene Garfield (1955) [1] Academic and professional books are both actors and witnesses of their corresponding disciplines. While some books move the frontiers of ignorance, others harvest and synthesize knowledge that seems established [2][3][4][5]. In principle, analyses of citations to books may shed light on how concepts and methods changed in the course of time whilst a discipline evolved as a field of practice and academic subject; such analyses are also relevant to explore the influence of a discipline on other fields [4][5][6][7][8].
Analyses of citations to scientific books are rare and, to our knowledge, there are no comprehensive analyses of citations to books of epidemiology and biostatistics, nor to sets of books in any other of the health and life sciences [6][7][8][9][10]. The vast majority of citation analyses involve citations to articles [11]. Studies on citations made by books (to other books or to articles) are also uncommon. In this study we will analyze citations to books made by scientific biomedical articles. Analyzing the bibliometric impact of books is also relevant at a time when the very nature of books and the whole publishing endeavour are experiencing enormous changes.
Analyses of citations to epidemiologic books may complement other analyses on the nature, evolution and endeavours of epidemiology, a particularly integrative science [12,13]; for decades, epidemiology has been useful to integrate knowledge, methods, reasoning and cultural referents from multiple health and social sciences, including medicine and public health. Hence, uses of epidemiologic and statistical reasoning, knowledge, and methods -foremost, uses in biomedical research-are of broad scientific interest. Indeed, today concepts and methods with strong epidemiological roots and properties seem fruitfully applied 'within' and 'outside' epidemiology [12][13][14][15].
As a consequence, there cannot be an exhaustive and fixed list of books on epidemiology and biostatistics. This does not preclude the analysis of an intellectually coherent set of books. Furthermore, since the present study is the first of its kind in epidemiology, a broad perspective is warranted. Therefore, rather than using narrow lists or definitions of what is a book on epidemiology or biostatistics, we will apply a wide, inclusive approach. Specifically, we will aim at including in the study books that have a clear biostatistic or epidemiologic component or dimension, relevant to epidemiology. As we shall see, while all books included are unequivocally on epidemiology or biostatistics, many focus exclusively on biostatistical methods and techniques, and a few others are fundamentally on public health, preventive medicine, clinical epidemiology or other epidemiologic specialties. The Appendix S1 includes a few texts (mostly, essays on health policy and philosophy of medicine) in which epidemiology has a secondary role, but which nevertheless are referents.
The objective of this study was to analyze the number and time trends of citations received from scientific biomedical articles by selected books of epidemiology, biostatistics and related disciplines, including public health and preventive medicine.

Selection of books
Based on academic lists of textbooks [3,4,16], books selected by The James Lind Library [17] and the People's Epidemiology Library [18], publishers' catalogues, books cited in other books, and our own teaching and research references, we first searched for citations to over 200 books on epidemiological and statistical methods and concepts. The books initially included were published from 1957 until 2010, and this is the main period of publication covered by the present study; nevertheless, we occasionally expanded the timeframe backwards to assess books published before 1957 that we deemed important texts for reference.
The primary aim was to include books that had a clear biostatistic or epidemiologic component or dimension, relevant to epidemiology and, foremost, to actual uses of epidemiologic methods in research and practice. We accepted books that focus exclusively on biostatistical methods and techniques, but did not consider books on bioinformatics or that focus on mathematical models without emphasising their potential biostatistical application. We also included books on public health, preventive medicine, clinical epidemiology and other epidemiologic specialties. We thus selected books that are pertinent to applied biostatistical and epidemiological methods and techniques; epidemiologic concepts and theory; health services and policies; and substantive epidemiology (e.g., nutritional epidemiology). Books were selected regardless of whether they were aimed at researchers, practitioners, postgraduate students or undergraduates.
If a book received less than 25 citations, it was finally not included in the study; examples of exclusions are given in the Table S1. Books included are listed in the References section in alphabetic order , except if already cited paragraphs [12,14]. If a book had many editions, only a selection is cited in the References, but we aimed at counting citations to all editions (see Appendix S1).

Bibliometric analyses
The data source for the study was the Web of Science, produced historically by the Institute for Scientific Information, Inc. (ISI), and in recent years by Thomson -Reuters [305] To measure the overall bibliometric impact of each book, we aimed at retrieving all citations received by all editions of the book from the year of publication until 31 December 2011. Thus, our primary measure was the total number of citations received by each book, unadjusted by the number of editions or time since first publication. A second indicator was the average number of citations received by the book per year since publication of the first edition. We also focused on books that received over 1000 citations since publication and books that received an average of more than 40 citations per year since publication. Thus, books published long ago had more time to accumulate citations. Books published in recent decades were a priori more likely to receive more than 40 citations per year, since the citing base of articles was larger. Using sampling of papers with random trigrams author searches, we estimated that in the Web of Science the number of papers (and by extrapolation, citations) indexed per year increased approximately 3-fold in the last 3 decades. Finally, we computed an annual citation rate, the number of citations received each year by the selected books at each corresponding time divided by the corresponding number of selected books. To allow for delays in the publication of citing articles, our last search was conducted on February 9, 2012 (always aiming at retrieving citations made by articles published until December 31, 2011). The present study was developed from a minor pilot study conducted in 2006 [16].
Through a 'cited reference search', we used combinations of 'cited author' and 'cited work', the latter referring to the title of a book or to the name of a journal, not to the title of an article. All citations were checked for accuracy as explained in Appendix S1. For each book several possible citation options were searched to allow for different abbreviations and for citation errors. Information on all books was verified against valid information on all editions of the book.
For a given book we included all directly related editions and printings. Citations to each book were searched independently by name of author or editor, and title; all authors or editors (not just the first one) were used in different searches. The data source does not allow to reliably identify chapters of books written by authors other than the book editors [306], unless the title of the book is unique or unambiguous (e.g., Modern Epidemiology [250][251][252], Oxford Textbook of Public Health [133][134][135][136][137]). Thus, if the title was unambiguous, the number of citations includes citations to individual chapters not authored by the book editors; if the title of the book was common, we could not include citations registered with the name of the first author of each chapter (when other than the editors), and some citations to chapters not written by the editors had to be excluded (Appendix S1 and Figure S1). Citing articles were thus identified, and the year of publication of each article was analyzed (through the Web of Science option ''analyze results'') to assess time trends in the number of citations to each book.

General trends
The 125 books that received 25 citations or more were published between 1913 and 2004 and, therefore, the maximum number of years since publication was 98, and the minimum, 7 (median, 24 years); 100 books (80%) had been published for over 15 years (Table 1) Table 2).
All 10 top-cited books had an edition (not necessarily the first) during 1980-1989. This decade also stands out when we analyze the relative proportion of the study books that received over 1000 citations: such proportion peaked at near 15% in 1980-1989, as did the relative proportion of texts that received more than 40 citations per year since publication ( Table 1).
The total number of citations received by the 125 books was 183 401, of which 52% were to books first published in 1980-1989. Books first published in this decade also had the highest mean and median number of citations per year since publication ( Table 1). The crude number of citations to the selected books increased substantially from 1990 to the mid-1990s, and remained stable thereafter: there were 160  Among the 50 top-cited books, 15 are completely or strongly on applied biostatistical methods, and the contents of at least a further three is heavily on quantitative methods (18 out of 50 = 36%). At least 33 of such 50 books (two-thirds) have strong methodological contents.
Nine of the top 10 books are general works on statistical or epidemiological methods: 7 of 10, including the top three, are strongly focused on statistics, while two address epidemiological methods as well (4th. and 8th. places, also with strong statistical contents). Ranking 10th. and 15th., we find texts on health measurement ( Table 2).

Influential individual titles
Hosmer & Lemeshow's Applied logistic regression [139] is the title that received the highest number of citations and the highest average annual rate since publication. It is followed by Fleiss [ Year of 1st.       Texts on clinical epidemiology and evidence-based medicine were in positions 11th., 12th., 20th., 24th., 28th or 35th. (by total citations) ( Table 2). While other texts on clinical epidemiology appeared earlier on, they were seldom cited. The bloom of clinical epidemiology -and later, evidence-based medicine-in the 1980s is apparent in Figure 4; the Figure suggests  Willett's book on nutritional epidemiology, also strongly focused on methods, ranked 13th.; it is the first book on that specialty.      Figure 6 reflects the higher weight in the citing literature of research methods-oriented books over textbooks primarily used in the classroom. Table 2 permit to assess the evolution of citations received by different kinds of books on a similar subject. For instance, Figure 7 shows such evolution for the book by Breslow [70], a book stemming from a series of articles in a medical journal; also originating in series of articles published in medical journals are books by Hill [128], Bailar & Mosteller [40], andalready mentioned-Gore & Altman [111].

Other analyses and comparisons
The study aims did not include to assess the scholarly performance or influence of an author. It will hence suffice to illustrate how citations to books may complement other analyses on such performance. For instance, Figure S5 may add to existing views on the influence of the two books by H. Blalock selected for the study: citations peaked in the 1970s, and were still significant in subsequent decades. Another example: a priori both A.     Figures S6, S7, S8, S9, S10.

Discussion
The most prominent pattern in our findings was that the books were mainly cited to reference methods and statistical techniques. The number of citations was larger for books that provide unique sources for (general or specific) methods used in biomedical research papers than for books that innovated in theory, discuss concepts or are preferentially used in teaching. Nine of the 10 most cited texts focused on widely used statistical methods. Some books that have likely been influential -scientifically, theoretically, educationally-in biomedical research or health policy were not highly cited; notable examples include texts by Hill [128], Susser [283], Rose [233], Morris [199], Cochrane [69], or McKeown [193]. Their lower citation figures may partly be due to earlier publication, when all works received less citations because a lower number of articles was published than later on. For instance, Hill's books ranked 17th. [128][129][130] and 88th. [131] (Table 2); the first book [128][129][130] was first published in 1937, decades ahead of all texts in the first 40 positions ( Figure S6). Older books on theory and policies were rooted in societal and general medical concerns, while the most modern books are almost purely on methods, particularly statistical methods.
By study design, 1945 was the first year of publication of the citing articles; yet, plenty of books on statistics and epidemiology existed before 1945: several books by Major Greenwood (1880Greenwood ( -1949) (e.g., his major opus, Epidemics and Crowd Diseases) [118]; Woods and Russels book on teaching statistics, which preceded A.B. Hill's; the Epidemiologic Essays by Crookshank from 1930; Epidemiology Old and New, by Hamer, from 1928; or The Natural History of Disease, by Ryle, of 1948, which inspired countless public health professionals [14,15]. So, while we had to stick to a relatively limited time-frame we could also make some comparisons with a few selected classics, including the 3 books published between 1913 and 1949 (Table 1) [118,128,236].
The lower citation figures of books on concepts and policies than of books on methods do not necessarily mean that the citing literature pays less attention to theoretical than to methodological issues: such figures may reflect that authors of articles tend to take basic ideas and concepts for granted, as part of the core or background knowledge of most readers, and therefore not needing to be referenced [2,4,8,13]. Citation frequency is only one and modest proxy for influence [1]. A main reason why books on biostatistics top the list may be that they provide convenient citations for statistical techniques widely used in biomedical research. Most biomedical journals request such citations. An important issue is also that findings must be replicable and, therefore, authors cite methods in detail.
It was beyond the scope of the study to register: a) whether citations were to a single technique explained in a specific part of the cited book or a generic citation; b) who was citing (e.g., what epidemiologic or clinical specialties, substantive areas, academic groups); or c) the appropriateness of the citation. As citations to articles [1,13,[312][313][314][315], citations to books may be shaped by convenience and habit rather than by a genuine need for scientific reference. Many citations to articles are non-specific, not-justified, or wrong [316]. We are not aware of similar studies on book citations, but we suspect that these problems may also be common with citations to books.
The total citing base (over 20 millions of articles with over half a billion citations) [317][318][319] certainly includes many more empir-ical studies (which require methodological and technical citations) than articles on theoretical and conceptual issues. However, a key question is whether current practices of referencing do justice to the needs of science. For instance, logistic regression has many variants and implementations; it is important that a research paper explains exactly what was done in the model selection, building, analysis, and validation in the specific application. If instead the paper provides no information on these issues, but just a citation to a book, then science is not well served. Future studies could assess how often the citation also included reference to specific pages or chapters in the book, and whether the citation was specific or just to the whole book.
Future research could use a random sample of the citing articles, their citations to the relevant books, and the 3 factors mentioned above; e.g., see the appropriateness and specificity of the citations, and whether they act as an alibi for not mentioning important methodological details. Such study will need retrieving and data extracting the articles; so, it will have to be limited to a few hundred references, to examine the sentence and context where the reference appears. A future study could also identify influential books that we may have overlooked, use other reliable data sources to measure citations received by books, or obtain data on annual book sales, which could be compared to citation data.
Books primarily meant for students, some of which have likely been highly sold and effective in classrooms -and most influential in coalescing epidemiologic thinking-collected relatively less citations. Important examples include books by Barker et al. [43], MacMahon et al. [180], the Lilienfelds [176], Gordis [108], or Szklo & Nieto [286]. Being used in research is not the primary purpose of such textbooks [2,3]. Knowledge sifted into a textbook has become normal, background, foundational knowledge [4]; there is no need to reference the material, even though it may be of great importance. A rather different case is a book not just addressed to students, but also to practicing physicians: Sackett's et al. Evidence-based medicine [258,259], which gathered more than 4000 citations. An analysis of the development of professional and scientific movements as evidence-based medicine could use and expand some of the study findings [12,14,320,321]. It would in turn provide new insights into the actual influence of some books; e.g., Cochrane's [69] and Sackett's [258,259] roles in the creation of the Cochrane Collaboration and the UK National Institute for Health and Clinical Excellence (NICE) [8].
Analyzing citations received by books of epidemiology published over the last 50 years is also a way of tracing the history of the discipline and, to a lesser extent, of biomedical research. Citations contribute to analyze the evolution of the corpus of methods and concepts used by epidemiology and by the disciplines with which epidemiology interacts [2,8,[12][13][14][15]322,323]. The rise of multivariate methods is a relevant example; it clearly explains a large portion of the citations made by papers published in the 1980s. Such methods were then not yet 'normal science' [4] in epidemiological and clinical research, and were hence in need of referencing.
The decade 1980-1989 stood out as the period in which more highly-cited books were published. Although we ignore the total number of books on epidemiology and biostatistics published in each period, missing books are unlikely to be that influential so as to change this finding. Overall, other than a few dozen highlycited books, most books had limited or minimal citation impact. Books published in recent years do not seem to have the influence of textbooks published for the first time in the 1980s. Moreover, the total annual citations to books have been rather steady in the last 15 years, or even declining for many specific texts. Given the large increase in the annual number of citations registered in the Web of Science, the relative citation influence of the study books is probably decreasing sharply, when seen as a proportion of total citations. The total number of citations to the study books (some 183 000 over half a century) probably represents a small fraction of the total citations to the epidemiological and statistical literature, which we estimate to be several million among the over half a billion citations indexed by the Web of Science. Even the few highly-cited books seem to have lost their relative impact in the last 2 decades, because between 1980 and 2011 the number of citing articles (and, correspondingly, citations) increased severalfold. It would next be necessary to perform some adjustment of the citation counts for the total number of citations each year. A comparison with books from other disciplines would then also be meaningful. In spite of its limitations (e.g., choice of citing journals, focus on research-oriented, largely Anglo-Saxon journals) the study data source has long enabled relevant analyses [1,6,[312][313][314][315]324].
We are not aware of any typology of books that could be useful for the present study. Although it would have been possible to use narrower definitions of what is a book on epidemiology and biostatistics, we chose a wide and inclusive approach, which yields a broad perspective and enables a richer set of comparisons; it also allows readers to conduct other, more focused comparisons. The inclusive approach seems particularly appropriate at a time when no other analyses are yet available.
In spite of the study limitations, we think that just comparing the number and trends of citations to the books that we identified is of interest. The uses of books and their true impact on research and professional practices are questions that often seem to escape analysis. Yet books comprise an important part of scientific production, and evaluations of such production would benefit from considering books along scientific articles. Several projects are under development to register citations to books [11,325,326]. However, none has at present the scope in time and the qualitative features of the present study. Analyses of citations to books are just one approach to analyses of uses of books. An additional task for future research would be to analyze the relative importance of books and articles in the exercise of scientific leadership and social influence by selected authors. An example is shown in Figure S10, which shows that the increase in the number of citations to the book on logistic regression by Hosmer & Lemeshow did not come at the expense of the highly cited article by Mantel & Haenszel [327].
Several influential authors wrote more than one or two books. In the case of M. Susser, for instance, his Sociology in medicine [328] has accumulated over 300 citations. To value Susser's oeuvre it is thus necessary to assess several other books of his [283,308,[329][330][331] [3,4,5,14,15,323,332,333]. Some books on research methods were less cited than articles on the same methods; for instance, some papers by Hill [334] or Miettinen [335] have each accumulated over 2400 and 1500 citations, respectively [6]. However, seminal papers published long ago received comparatively few citations; for instance, a paper by Jerome Cornfield [336], published over 60 years ago, received less than 450 citations. The review on the causes of cancer by Doll & Peto [307] provides a telling contrast between citations received by the book [73] (about 900 citations) and the -identical-journal article [337] (some 2300 citations); the case illustrates the need to exert caution when dealing with different formats and vehicles of scientific communication, an issue that is getting ever more complex with digital formats and the Internet [11,338]. Several other articles by Doll & Peto have each received over 1500 citations. The most cited paper by Willett [339] has received some 2000 citations, about half than his book [302]. But some sets of papers by him and other authors are, together, more cited than the reference book by the same author(s). The most cited papers by Cochrane [340,341] received some 500 citations, less than half his book [69]. The top-cited paper by Rose [342] received some 1000 citations, similar to his book [233]; and again, several other papers of his have been highly cited. All articles coauthored by K. J. Rothman received over 12400 citations, a similar amount than Modern epidemiology [250][251][252]. Just two of several books by David G. Kleinbaum combined -Applied regression analysis [163][164][165][166] and Epidemiologic research [167]received over 17000 citations (Table 2), whereas his articles have some 3000 citations.
To conclude, results may reflect a considerably positive influence of epidemiologic and statistical science on biomedical research [139,140,[199][200][201]. On the other hand, results on the smaller number of citations received by theoretical and policyoriented books cannot discern whether the influence was weaker on the ideas and policies that so much affect the health and wellbeing of citizens.

Supporting Information
Appendix S1 Details on Materials and Methods.