This paper examines the proximity of authors to those they cite using degrees of separation in a co-author network, essentially using collaboration networks to expand on the notion of self-citations. While the proportion of direct self-citations (including co-authors of both citing and cited papers) is relatively constant in time and across specialties in the natural sciences (10% of references) and the social sciences (20%), the same cannot be said for citations to authors who are members of the co-author network. Differences between fields and trends over time lie not only in the degree of co-authorship which defines the large-scale topology of the collaboration network, but also in the referencing practices within a given discipline, computed by defining a propensity to cite at a given distance within the collaboration network. Overall, there is little tendency to cite those nearby in the collaboration network, excluding direct self-citations. These results are interpreted in terms of small-scale structure, field-specific citation practices, and the value of local co-author networks for the production of knowledge and for the accumulation of symbolic capital. Given the various levels of integration between co-authors, our findings shed light on the question of the availability of ‘arm's length’ expert reviewers of grant applications and manuscripts.
Citation: Wallace ML, Larivière V, Gingras Y (2012) A Small World of Citations? The Influence of Collaboration Networks on Citation Practices. PLoS ONE 7(3): e33339. https://doi.org/10.1371/journal.pone.0033339
Editor: Petter Holme, Umeå University, Sweden
Received: July 27, 2011; Accepted: February 14, 2012; Published: March 7, 2012
Copyright: © 2012 Wallace et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Funding for this study was provided by the Social Sciences and Humanities Research Council of Canada. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Scientific collaborations and citation practices have been an important focus of interest among sociologists of science, seeking to provide insight into science as an inherently social and team-based endeavour. While co-authorship networks are relevant for understanding the network structure of scientific fields , , citation practices are central to the distribution of symbolic capital and its accumulation by scientists  and provide insight into the hierarchies within a field and among fields . Though everything suggests that some relationship must exist between co-authorship and citation practices, these two elements are generally treated separately and few papers have addressed that question. For instance, White et al.  combined, for a small group of researchers, bibliometrics with survey data to see whether citations were influenced by the social structure of the group. Introducing the notion of ‘inter-citation’ as a measure of citations between members of a given group, they aimed to correlate citations with social, socio-cognitive and intellectual ties. Their conclusions, based on only 16 individuals, are nuanced: there is some correlation, as one might expect, between collaboration and citation patterns but, overall, there is no strong or reliable link between social ties and citations (see also ,  for related studies).
Most recently, a relatively large dataset has been used to systematically explore a similar issue, that of the proximity of authors (in terms of collaboration) within and between various research topics in the field of information retrieval . They apply a similar treatment to citation networks, which reveals much about the intellectual cohesion of certain sub-fields, but does not reveal the degree to which social networks affect the referencing system. In addition, other recent studies such as that of Roth and Cointet  have successfully combined social and semantic networks as a means to understand the production of knowledge within ‘epistemic communities’. Moody  has examined the overall cohesion of an entire discipline through co-author networks. While this approach is important for considering macroscopic social characteristics of a discipline (e.g., how paradigms or methodologies co-exist), our work, by design, focuses only on the local structures within a co-author network. While there are obvious links to examining large-scale cohesion, the citation practices we are exploring (see Methods section below) are related to small-scale social structures (at the level of research groups).
This overall framework underlies the primary objective of the present work: to characterize several different scientific specialties and distinguish them in terms how and to what degree referencing practices—including self-citations—are linked to co-author networks. More specifically, this paper combines and expands on previous methods for analyzing co-author networks and for measuring self-citations, using a very large dataset (over 2,6M papers and 50M references) over more than 50 years. It poses the all-important question of whether the social network of researchers has an impact on the selection of references found in a given article. In contrast to White et al. , we restrict ourselves to co-authorship as an indicator of their social proximity. Collaboration networks can be considered as a subset of the complete social network of a scientist. Though one usually knows more scientists than the ones with whom one writes scientific papers, it seems natural to consider co-authors as part of that social network even in the case where no face-to-face interactions have occurred. Moreover, the ties with co-authors are probably stronger than with non co-authors and thus their effect on citation should be larger than with non co-authors even if the latter are part of the larger social network of the citing scientist.
In this paper we thus analyze the references of each article in terms of four levels of proximity, defined as co-authors or co-authors of co-authors in analogy with the concept of Erdös numbers (see Methods section below). In order to distinguish between a variety of citation practices within the natural and medical sciences (NMS) and social sciences and humanities (SSH), eight specialities were chosen for detailed analysis. After a detailed description of the methods and database used, we present the main results with a focus characterizing several of the scientific specialties explored here. Our discussion expands on this, providing insight on citation practices in terms of the social structure of scientific fields. Finally, the conclusion highlights the major findings of this study and some of its implications for science policy.
The data for this analysis comes from Thomson Scientific's Web of Science, which include the Science Citation Index Expanded (SCIE), Social Science Citation Index (SSCI), and Arts and Humanities Citation Index (AHCI) for the 1945–2008 period. Data is presented for 8 specialities (5 from the Natural and medical sciences, 3 from the Social sciences and humanities) based on the U.S. National Science Foundation (NSF) field classification : astronomy and astrophysics, atmospheric science and meteorology, biochemistry and molecular biology, economics, history, neurology and neurosurgery, organic chemistry and sociology. Only research articles, notes and reviews are included in the set.
There are two main methodological challenges to measuring how socially close citing authors are to those they cite. First, we need to conduct a large-scale analysis to measure the social proximity of cited authors to citing authors for many different scientific disciplines in order to capture the diversity of practices. Second, the analysis needs to be focused on the individual authors, in order to gain insight into their referencing practices and individual social networks.
In order to investigate the citation practices of a given scientific specialty in relation to its co-authorship network, we form a set of references R contained in the set of papers S published in a given year within a given specialty. Given that Thomson Reuters' Web of Science only indexes the names of co-authors of cited papers that are also source items, we restrict this set of references to those who can be identified within the database as source items , and which were published within the previous 10 years. Naturally, fewer source items within the SSH group will be located among the references (which include a greater number of monographs or book chapters). However, these are generally also older references . Therefore, given this bias in the SSH group when examining the proportion of references made to members of a (small) co-author network, we would, if anything be overestimating the degree to which authors (proportionally) cite other authors from their local network. In essence this process of selecting only recent and ‘citable’ items ensures that we are focussing on references to peers.
We generate a list of authors having contributed to each article , yielding a total set of authors A for the specialty as a whole. Similarly, we generate a second set of authors (and for the entire specialty) who collaborated within 2 years of a given publication year with authors in (restricted to the specialty in question in order to limit false positives due to the presence of homonyms). It should be noted that because of this ±2 year interval, the data presented is for the 1947–2006 period, though data is collected for the 1945–2008 period. Like in other bibliometrics analyses performed at the level of individual authors, name disambiguation is an issue, since references to authors with names like “Smith, J” could be erroneously matched to a different “Smith, J” who is a co-author. However, this does not play a major role in our large-scale study. Given that no treatment was performed in order to distinguish authors having the same surname and initial(s) (homonyms), our data can only overestimate self-citations for specialties with high levels of co-authorship. Nevertheless, based on small random samples, we estimate the number of homonyms at less than 5% of total positives, not enough to affect our results.
Thus, constitutes the unweighted and undirected co-author network. Finally we generate a third group of authors who collaborated with during the same time period. It should be noted that excludes all authors contained in , so in general, for networks which are relatively sparse, or which contain few co-authors, (while the opposite is true for cases when collaboration rates are high).
For each source article, we examine its set of references and classify them in the following way:
- If any of the authors of the referenced paper is contained in , then this is a self-citation;
- If any of the authors of the referenced paper is contained in , then this is a level-1 co-author citation;
- If any of the authors of the referenced paper is contained in , then this is a level-2 co-author citation;
- If none of the authors in the referenced paper are contained in , , or then this is called distant citation.
These categories are defined as mutually exclusive: if a referenced paper can be placed in more than one category, then it is assigned the one closest to a self-citation. References falling into categories A are obvious self-citations while those falling under B and C will hereafter be referred to as co-authorship citations. Level 1 and 2 co-author citations can also be seen as a measure of social proximity of the co-authors, with level 1 being closer to the author than level 2.
Many will recognize these levels as the beginning of the Erdös number or degrees of separation game  applied to each author individually, and his referenced authors as the ‘object’ of the game. From a sociology of science perspective, it is not necessary to continue past the second ‘level’ (Erdös number of 2), since we can consider that there are much fewer social connections past this level between the authors within a given specialty. In addition, given the number of authors and references being considered, the data mining procedure is both expensive in CPU time and memory usage. Figure 1 provides a visual representation of this algorithm.
Left: a set of three articles and 5 references therein. Right: The corresponding co-author network. Article A, for example is written by two authors (α and β) and contains three references (whose authors are also denoted by Greek letters). Based on our classification scheme of co-authorship citations, references A1 and B1 are self-references, A2 is a level-2 co-author citation (since α collaborated with δ who collaborated with ρ), A3 is a a level-1 co-author citation and B2 is a distant citation.
Finally, it should be noted that while the citing source items and authors are restricted to a given specialty, the items they cite are not. One would expect that the specialty in question covers the majority of peers cited, but such a limit, while defining a ‘closed’ system, would possibly introduce an artefact, particularly for more interdisciplinary specialties such as biochemistry (see Figure 2C). However, in such cases, we have checked that the results are similar, whether or not we restrict the specialty of the reference items. In addition, the bulk of the unweighted co-author network is similar even when expanding the list of co-authors to outside a given discipline, since most links made to authors in another specialty will also be made to authors within the same specialty. In addition, these links to outside one's own specialty will generally be more tenuous than those made to colleagues in the same specialty. Indeed, the notion of ‘invisible colleges’ would tend to support this. Nevertheless, it should be indicated that this restriction introduces a small caveat to the present study since it examines an artificially restricted network.
For each of the chosen specialties, A) number of papers, B) average number of references per paper, C) average number of authors per paper, D) average number of papers written by each author, E) percentage of identified references within the same specialty, and F) percentage of identified references defined as ‘recent’ (less than 10 years older than the source item).
1. Empirical evidence for co-authorship citation and macroscopic properties of scientific specialties
Based on the dataset described in the previous section, we first compute a few basic macroscopic variables which allow us to characterize the growth and structure of the chosen fields. The number of papers, the rate of co-authorship and the authors' productivity (Figures 2A, 2C and 2D) provide insight into the social structure and size of the discipline, while the number of references, proportion of intra-disciplinary references (e.g., economics to economics) and age of references (Figure 2B, 2E and 2F) provide information on the different citation practices across the eight chosen subfields. More generally, these data provide a benchmark for characterizing the production of scientific knowledge in various fields, and thus understanding our data on the proximity of citing and cited authors.
Figure 3 shows the distribution of citations across several specialties in the NMS (panels A–E) and the SSH (panels F–H). As one might expect, the proximity of references in each of the disciplines varies a great deal. Within the natural sciences, one immediately notices a major difference in the co-authorship proximity of references between, on the one hand, astrophysics/astronomy and atmospheric science and meteorology, and the rest of the specialties on the other hand. Aside from organic chemistry, all specialties show a clear decrease in the percentage of references made to distant authors with whom, according to our definition, they have no close connection. Furthermore, while it is clear that the size of the specialties (Figure 2A), the number of co-authors per paper (Figure 2B), and the proportion of ‘intra-specialty’ references (Figure 2E) have a clear impact on the proximity of references (as one might expect), none of these macroscopic quantities can singlehandedly explain the trends observed in Figure 3. In addition, there is no strong correlation between the tendency to cite recent literature (Figure 2F) and the proportion of that literature that is socially proximate.
A) astrophysics and astronomy, B) atmospheric science and meteorology, C) organic chemistry, D) biochemistry and molecular biology, E) neurology and neurosurgery, F) history, G) sociology and H) economics. The last three (F, G, H) are shown on a logarithmic scale for clarity (which explains why ‘distant’ references seem to be close to 100%). For the NMS, we compute the same distribution based on a subset of source articles (and their references) that contain only 5 authors or less (dashed gray lines).
Direct self-citation, however, is relatively constant both across fields and over time, hovering around 20% in NMS specialties and 10% in the SSH. Note that studies of various disciplines have found rates of self-citations among references varying between 10% and 36%, with strong variations between specialties –, and much lower percentages in SSH such as sociology and economics . Our data mostly agree with these numbers, although none of the specialties analyzed here obtain a number as high as 36%. It has also been found  that 1) self-citations are generally younger and have a shorter half-life than foreign citations, 2) self-citations stabilize in a period of 3–4 years after publication and 3) the percentage of self-citations only slightly increases with the number of co-authors.
The difference between the NMS and SSH is substantial, and dwarf the differences among SSH specialties shown in Figure 3. We find that there is no such thing as co-authorship citations within the three SSH fields studied. This is primarily due to the fact that co-authorship is less frequent in these disciplines and that, as a consequence, researchers have less co-authors in their social network to choose from, a clear limitation in the way we define our social network. For this reason, most of the following analysis focuses on the NMS.
For NMS disciplines, we also show the corresponding distribution of references when we limit the set S for each year to papers with 5 co-authors or less (gray dashed lines in Figure 3). While arbitrary, this immediately gives us a sense of the extent to which disciplines such as astrophysics and astronomy cite a larger proportion of papers authored by their recent co-authors due to the large number of papers with a large number of co-authors. Furthermore, it is more likely that authors of papers with 5 authors or less actually know each other. For clarity, we omit from Figure 3 the number of self-references, references to level-1 co-authors and to level-2 co-authors when this restriction is imposed. Interestingly, the increase in distant citations observed is at the expense of level-1 and level-2 co-authors citations, but does not affect self-citations.
This remarkable stability in the level of self-citations—across specialties and time—distinguishes this practice from that of citing those who have been recent collaborators (not just on the particular paper in question). This suggests that there might be cross-disciplinary norms regarding this practice in science. It must be noted that this does not imply a degree of conformity within the specialty—comparison of the distributions of self-citations would be more revealing in this respect. However, the stability of the average is important in understanding that this practice does not depend much on the number of co-authors or the citation practices of the discipline, but is a widespread and stable practice in all disciplines. For this reason and due to the increasing importance of research groups as a dominant unit for understanding scientific work, it is important to analyse co-author citations, which reflect the social proximity of citing and cited authors. By contrast, focusing on distant citations sheds light on the communication structure of scientific specialties by pointing at possibly independent sub-groups who are not in direct contact with each other through co-authorship links but are nonetheless cited. This approach can complement a more micro-level analysis of the reasons scientists invoke for citing papers .
It should be noted that the effect of having a large number of collaborators per paper amplifies the proportion of level-1 and level 2 proximate citations. Our findings clearly show that recent increases in the proximity of citing and cited authors are, in part, due to an increase in the size of collaborations. This is the case in astrophysics and astronomy, for instance. Co-authorship practices in fields such as astrophysics or particle physics often reflect the use of certain instruments or a willingness to acknowledge the contributions of a wider range of individuals in the division of labour, beyond the writing of the article itself . In this sense, there is inevitably a sociological basis to this combinatorial effect.
Our results also clearly show that the combinatorial effect cannot alone account for the proximity of citing and cited authors. Indeed, from a social network perspective, the co-author network is defined by more than the distribution of edges per node. In other words, it is not just about how large collaborations are, but also of what type of collaborations occur and where. We have also found that the distribution of clustering coefficients  is very similar in the co-author networks of five NSM scientific specialties in recent years. This index essentially measures the concentration of triangles within the network or to what extent collaborators of a given author also collaborate with each other. Therefore, other measures should be able to account for the local structure of the networks. Along the lines of Moody's work  we view self-citation and ‘group’ citation as a means to reinforce local social networks, which has particular importance for the intellectual and social development of scientific specialties.
2. Topology of the networks and citation practices
Reducing the number of co-authors to 5 or less is not sufficient to understand to what degree the number of other authors in proximity to a given author influences their choices of citations. This begs the question of how the number of level-1 and level-2 co-authors is distributed within each of the specialties. Figure 4 shows these distributions for two periods: 1960–1969 and 2000–2006. Two main observations can be made. First, variations in distributions of co-authors do not correlate highly with differences in the number of co-author references (Figures 3A–E), so other factors must be also at play. Second, the relatively even distribution of level-2 co-authors means that, within a given network, there will be wide variations from paper to paper in how many of these more distant co-authors are ‘available’ to be cited by a given author.
Distribution of the number of A) level-1 co-authors and C) level-2 co-authors during the 1960–1969 period; B) level-1 co-authors and D) level-2 co-authors during the 2000–2006 period.
The broadening of a distribution of co-authors cannot entirely account for increases in the proximity of citations. This is verified by randomly removing source papers (up to around 15% of the network in order to maintain its general shape) until the distributions of authors per paper are almost (though not quite) identical in all 8 specialties and using only the first author of references. This reduces the effect of skewed distributions while ensuring that the ‘reduced’ network retains sociological meaning. Similarly, we can randomly remove papers in a given specialty such that each author in a given interval of time has only 1 paper. These procedures have the effect of diluting the network (i.e., reducing the amount of clusters) , . Once again, we see no major effect on the proximity of citations. In addition, the differences observed in proximity of citations are not (or only very weakly) reflected in measures such as the distribution of k-cores or the number of cliques. Once again, it is important to emphasize that our study examines the topology in citation proximity to each individual author, not the overall structure of the discipline.
One of the main advantages of our method for examining referencing patterns is the ability to conduct the analysis at the level of each author or paper. It is thus useful to think of each author making referencing choices based in part on other authors that are in proximity to him or her. More specifically, given the number of references and authors associated with a given paper, we can consider how many of the various types of co-authorship references are selected, compared to the number expected randomly. This is essentially about the propensity to mobilize one's social network as part of the referencing process (and the production of knowledge). We can define propensity (Pd) for a given level of proximity d as the ratio of the number of authors matched (or observed) empirically to the expected number of authors to be matched given a random selection of references. One will recognize this random case as a simple combinatorics problem with the expectation value equal to the number of repetitions times the probability of a success (since this follows a binomial distribution). This probability of ‘choosing’ a unique name is approximately equal to the number of referenced authors divided by the number of total authors available within the given specialty. This process is repeated a number of times equal to the number of co-authors. Thus, for a single article in a given year, the propensity for citing a level-1 (co-author) or level-2 (co-author of co-author) reference can be computed as a ratio of observed to expected cases:(1)where , are the number of cases empirically observed at each level, and the remainder of the terms pertaining to the expected cases: the number of authors of the ith referenced paper, the total number of remaining (i.e., that have not yet been assigned at a ‘closer’ level) references of the given paper. Thus, the numerator is determined by empirical ‘matches’ and the size of the entire network, while the denominator reflects the size of the author's network and that of the cited authors' networks. Like our data presented in Figure 3, the propensity is computed in sequence, in order of proximity, with the ‘matched’ references removed at each step. In other words, the level-2 propensity, for instance, is not ‘skewed’ by the number of level-0 or level-1 references already found for the given paper. Similarly, if there are no available authors in the level-1 or level-2 set, then the corresponding propensity is not calculated.
Tables 1 and 2 show the propensity (computed individually for each source paper in the 10-year period, then averaged) for citations to level-1 and level-2 co-authors for three time periods, as described above in the Methods section. The results do not go beyond 1995, as the calculation of the propensity becomes prohibitively long as the number of authors and journals grows exponentially (Figure 2). In general, there is very little propensity to level-2 citations. Data for self-citations (an order of magnitude higher than for level-1 citations to co-authors, as one might expect) are not shown here, since, as demonstrated in Figure 3, the number of direct self-citations does not appear to vary across time, so the propensity is not a useful quantity. We also note a general rise in propensity since the 1950s, with a slower growth rate since the 1970s, consistent with the expansion of scientific disciplines around this time. The propensity data is revealing in terms of how citation practices have evolved across fields and time. Overall, the propensity for co-author citations has decreased or remained stable in more recent years, trends which complement those observed in Figure 3. In addition, some fields (e.g., neurology, meteorology and atmospheric science) show a substantial recent decrease in , but not in , which suggests changes in the scale of the propensity for co-author citations: given a local co-author network, the tendency to cite more distant (level-2) co-authors is on a relative decline. In meteorology and atmospheric science, for instance, the reliance on level-2 co-authors as references peaked some 30 years ago, just as the number of references per paper and the number of authors per paper began to increase substantially (Figure 2B,C).
If a relatively large field (e.g., biochemistry and molecular biology, or economics) contains many groups working on largely independent topics, then the propensity for self-citation, level-1 citations and level-2 citations tends to be high. This is the major caveat that must be applied when examining the propensity: the epistemic community from which an author can seek out citations might not grow as fast as the entire specialty. For instance, high levels of co-authorship citations in astrophysics and astronomy, or in atmospheric science and meteorology, are largely determined by changes the structure of the specialties, and less by the choices of citing authors. Nevertheless, using scientific specialties as a pool of relevant knowledge for authors remains instructive, especially since we are characterizing citation practices at the level of an entire specialty (and not of a sub-community). Furthermore, while there are limitations in the classification of these areas, it should be pointed out that they are constituted as a set of journals with not only similar topics, but also similar citation patterns . Thus, while it cannot point to a single parameter, the propensity can be compared across scientific specialties (SSH included) with very different co-author network topologies to produce meaningful results, particularly when taking into account the overall growth of the specialties (Figure 2A). Furthermore, for specialties of comparable size, the propensity is able to provide insight into to the way local communities or invisible colleges operate.
The method shows that level-1 citations are far from random, which likely reflects the specialization of researchers and the cumulative nature of research. Interestingly, the only two specialties which, recently, tend less and less to cite socially close authors (that is, level-1 and level-2 co-authors) are organic chemistry and, to a lesser degree, biochemistry. This confirms the validity of the trend observed earlier in Figure 3 and might indicate either that different types of referencing practices exist within organic chemistry (e.g., there are fewer perfunctory references) or that authors search out information from further afield.
The normalized distribution of values for and for all papers in a given year is also revealing. If we take data from 1985, for instance, we immediately see that the level of ‘zero’ contributions (top-left in Figures 5A and B) vary widely among disciplines. For those articles which display a non-zero propensity there in a second, non-zero local maximum somewhere between 10 and 100, according to scientific specialty. While many papers with relatively low propensity to cite their recent authors dominates in astrophysics, only a few papers with high propensity dominate in economics and, to a lesser degree, in the atmospheric sciences. The practice is more generalized in astrophysics, but dominated by a few areas with high levels of propensity in most other fields. Once again, large, heterogeneous fields such as biochemistry have the longest tails indicating a lack of uniformity in the citation practices of its members. In the case of the propensity to cite authors in one's level-2 co-author network, the same distribution is absent, and the overall lack of references of this type (compared with the ‘random’ case) means that there is no non-zero local maximum in the propensity .
Distribution of A) and B) (B) for all papers published in 1985 in the disciplines selected. The normalized distribution is computed using a logarithmic binning scheme and the values of the y-intercepts (number of articles which don't cite any level-1 or level-2 co-authors) are listed in each figure.
Ranking the articles in order of propensity at each level, we can also compute Spearman's rank correlation coefficient (ρ) between , and . While there is very little correlation between (the propensity to self-citation) and , the correlation is greater between and (from 0.15 in organic chemistry and 0.19 in the atmospheric sciences, to 0.40 for astrophysics). This means that authors (or, strictly speaking, papers) who tend to cite co-authors, also tend to cite co-authors of co-authors. However, given the method described above for computing the propensity, this result is not at all trivial. It implies, as do the overall trends found in Tables 1 and 2, that ‘group’-citations (level-1 and level-2) likely do not similar patterns as direct self-citations.
Finally, consider the case of organic chemistry, (which remains relatively unexplored in the sociology of science literature), a medium-sized field with relatively high levels of co-authorship and average levels of interdisciplinarity. We have observed that co-authorship citations are low and remarkably stable, both in absolute (Figure 3) and relative terms (Tables 1 and 2), and that self-citation may be declining in recent years. This could indicate that, in this area, perfunctory referencing may be on the decline and/or that authors see decreasing value in citing their local co-author network (in other words, that the more distant authors perform work that is seen as equally—if not more—relevant than ‘nearby’ authors). In fact, data from Figure 4 points to the fact that only a few organic chemists cite their collaboration network heavily, while the majority do not cite it at all.
Beyond a characterization of the citation practices of individual specialties, we can interpret these findings as measures of the value of the social networks in citation practices. In other words, to what degree is the social network of an author a determining factor in the process of producing and disseminating knowledge through publications in different fields? The social network of scientific collaborators may ‘over-determine’ citation practices: high levels of propensity imply a need for researchers to rely on the value of their local network either for social (e.g., there is insufficient contact with other groups) or cognitive (e.g., several paradigms coexist) reasons. Finally, since co-authors usually represent only a subset of the entire social network of a scientist, the expansion of one's list of co-authors does not always imply an increase in social network size, as it could include many with which they have not yet collaborated. One could argue that for papers with a very large number of co-authors (for example larger than 50), not everyone really knows one another. These cases are in fact relatively infrequent and limited to a few subfields. Nevertheless, to account for them, our analysis has also included only selecting papers where the number of co-authors is 5 or less (see Figure 3).
This is also about symbolic capital associated with collaboration networks: within the science system, publications are the primary means of establishing scientific authority among peers . As Bourdieu put it, “claims to legitimacy draw their legitimacy from the relative strength of the groups whose interest they seek to express” . But this symbolic capital being a rare and contested resource, scientists who contest its value for a given scientist could identify these co-author citations with a kind of self-citation, which tend to be perceived in a negative manner, thus annihilating its value. Hence, at the analytical level, it is important to distinguish self-citations from level-1 and level 2 citations, though actors could try to extend the negative connotations of the former to the latter.
The issue of small-scale structure can only be partially explored through co-author networks. While two research groups may be entirely disconnected in terms of co-authorship proximity, they may be working on identical topics and thus still cite each other as distant citations. Since our analysis is performed at the paper level, we can only address the level of small clusters. Recent studies of astrophysics, for instance, have confirmed the trends observed here of an increased reliance on a small number of journals . Some of these characteristics are also shared by the atmospheric sciences. In this case, the fact that information is rarely sought outside the specialty and the presence of large numbers of co-authors (Figure 1) are the dominant sources of the high percentage of co-authorship citations. In sociology, on the other hand, the level of co-authorship has had no effect and the field's expansion and diversification (in terms of different journals and topics covered therein) has balanced any increased propensity to cite authors who are/were also co-authors. In economics, it appears as though several of these factors may have contributed to a slight increase in co-authorship citations, though they remain extremely low. Our data clearly indicate that while the small-world phenomena observed in economics may be true due to increases in co-authorship, among other factors , this has very little bearing on the citation practices of each author based on his or her local network.
In the context of a broader understanding of trends in the structure and practices of the various NMS and SSH specialty areas, our analysis quantifies a general increase in citations made to co-authors, which reveals both an increase in co-authorship and, in many fields, an increased reliance on the local co-author network—one's collaborators, research group or ‘proximate’ research groups. Recent work regarding the decline of uncitedness  and strong evidence that scholarship is becoming less and less concentrated  point to the fact that scholarship is not narrowing within science in general, although our data shows a correlation between fields' high levels of co-author referencing and high levels of intra-specialty citations (Figures 2 and 3).
Our paper expanded on the notion of self-citation to analyse the relationship between co-authorship and citation in many disciplines of the NMS and SSH, using vast quantities of data and a new algorithm. It shows that there is no single key to understanding why authors of a given specialty may cite authors with whom they, or their co-authors, have previously published. The more drastic differences among fields or over time are due to variations in levels of co-authorship, but more subtle changes are linked to the reliance of authors on their local network (and the shape of these local networks). This, in turn, is likely linked to the social structure of a given specialty on a small scale and the degree of intra-specialty referencing (to what degree does scholarly work build on a closed set of journals). More specifically, we have shown that:
- The gap in co-author citations between the social sciences and natural sciences remains very large, due to the different levels of co-authorship and citation practices of the actors.
- Self-citation is constant in time and across specialties of the natural sciences and the social sciences (where it is much lower), and is not dependent on the size of networks or the citation practices of actors.
- The propensity to cite co-authors and co-authors of co-authors varies widely among fields (when compared to what would be expected given the number of references per paper and size of the network). Within each field (particularly in the social sciences and less in astronomy and astrophysics), the distribution of these propensities also reveals a great deal of heterogeneity in the set of papers.
- Papers which tend to cite collaborators will also tend to cite collaborators of collaborators.
By considering the empirical data in terms of the symbolic capital and cognitive value associated with the collaboration networks, our results can thus help temper and qualify some of the recurring concerns related to the manipulation of research evaluation data through ‘citation cartels’, for instance, for which large-scale empirical data has been lacking , , . More generally, co-author referencing is often regarded as a perversion of the citation process, and seen as evidence that a field is too inward-looking or controlled by a small number of authors. A recent article by Bras-Amorós et al.  highlights this point, by analyzing the citation ‘distance’ as an impediment to their quality. Our analysis suggests that this is not necessarily the case. Indeed, co-authorship itself can have many meanings, not only in terms of division of labour, but also as a means of establishing a hierarchy within a field, and these meanings also vary widely among specialties. The formation of large groups using each other's work and collaborating more or less frequently, does not necessarily imply ‘citation cartels’ or nepotism. However, it is true that the high socio-cognitive ‘compactness’ of fields such as astrophysics and astronomy, or meteorology and the atmospheric sciences, might pose certain problems. For instance, it can be more difficult to locate ‘unbiased’, arm's length reviewers of papers and grants.
Conceived and designed the experiments: MLW VL YG. Performed the experiments: MLW VL. Analyzed the data: MLW VL. Contributed reagents/materials/analysis tools: MLW VL YG. Wrote the paper: MLW VL YG.
Crane D (1972) Invisible colleges. Diffusion of knowledge in scientific communities. Chicago: University of Chicago Press. 213 p.
- 2. Jones BF, Wuchty S, Uzzi B (2008) Multi-University Research Teams: Shifting Impact, Geography, and Stratification in Science. Science 322: 1259–1262.
- 3. Gilbert GN (1977) Referencing as persuation. Social Studies of Science 7: 113–122.
Whitley R (1984) The Intellectual and Social Organization of the Sciences. Oxford: Oxford University Press.
- 5. White HD, Wellman B, Nazer N (2004) Does citation reflect social structure? Longitudinal evidence from the “Globenet” interdisciplinary research group. J Am Soc Inf Sci Technol 55: 111–126.
- 6. Johnson B, Oppenheim C (2007) How socially connected are citers to those that they cite? Journal of Documentation 63: 609–637.
- 7. Rowlands I (1999) Patterns of author cocitation in information policy: evidence of social, collaborative and cognitive structure. Scientometrics 44: 533–546.
- 8. Ding Y (2011) Scientific collaboration and endorsement: Network analysis of coauthorship and citation networks. Journal of Informetrics 5: 187–203.
- 9. Roth C, Cointet J-P (2010) Social and semantic coevolution in knowledge networks. Social Networks 32: 16–29.
- 10. Moody J (2004) The Structure of a Social Science Collaboration Network: Disciplinary cohesion from 1963 to 1999. American Sociological Review 69: 213–238.
National Science Foundation. Science and Technology Indicators 2006 – Chapter 5: Academic Research and Development. Available: http://www.nsf.gov/statistics/seind06/c5/c5s3.htm#sb1. Accessed 14 Feb 2012.
- 12. Snyder H, Bonzi S (1998) Patterns of self-citation across disciplines (1980–1989). J Inf Sci 24: 431–435.
Larivière V, Archambault É, Gingras Y (2007) pp. 449–456. (2007) Long-term Patterns in the Aging of Scientific Literature, 1900–2004, Proceedings of the 11th Conference of the International Society for Scientometrics and Informetrics (ISSI), Madrid, Spain.
- 14. Batagelj V, Mrvar A (2000) Some analyses of Erdös collaboration graph. Social Networks 22: 173–186.
- 15. Tagliacozzo R (1977) Self-citation in scientific literature. Journal of Documentation 33: 251–265.
- 16. Lawani SM (1982) On the Heterogeneity and Classification of Author Self-Citations. J Am Soc Inf Sci 33: 281–284.
- 17. MacRoberts MH, MacRoberts BR (1989) Problems of citation analysis: A critical review. J Am Soc Inf Sci 40: 342–349.
- 18. Bonzi S, Snyder H (1990) Patterns of self citation across fields of inquiry. Proceedings of the 53rd Annual Meeting of the American Society for Information Science 27: 204–207.
- 19. Glänzel W, Debackere K, Thijs B, Schubert A (2006) A concise review on the role of author self-citations in information science, bibliometrics and science policy. Scientometrics 67: 263–277.
- 20. Milard B (2010) Les citations scientifiques: des réseaux de références dans des univers de références. L'exemple d'articles de chimie. Redes 19(4): 69–93.
Biagioli M (2003) Rights or rewards? Changing frameworks of scientific authorship,. In: Biagioli M, Galison P, editors. Scientific authorship: Credit and intellectual property in science. New York and London: Routledge. pp. 253–279.
- 22. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393: 440–442.
- 23. Newman MEJ (2001) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64: 016131.
- 24. Newman MEJ (2004) Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci U S A 101: 5202–5205.
Hamilton K (2003) Subfield and Level Classification of Journals (CHI Report No. 2012-R). Cherry Hill, NJ: CHI Research.
Bourdieu P (1986) The forms of capital. In: Richardson J., editor. Handbook of Theory and Research for the Sociology of Education. (New York, Greenwood). pp. 241–258.
- 27. Bourdieu P (1975) The specificity of the scientific field and the social conditions for the progress of reason. Soc Sci Inf 14: 24.
- 28. Abt HA (2009) Do astronomical journals have extensive self-referencing? Publications of the Astronomical Society of the Pacific 121: 73–75.
- 29. Goyal S, van der Leij MJ, Moraga-González JL (2005) Economics: an emerging small world. Journal of Political Economy 114: 403–412.
- 30. Wallace ML, Larivière V, Gingras Y (2009) Modeling a Century of Citation Distributions. Journal of Informetrics 3: 296–303.
- 31. Larivière V, Gingras Y, Archambault É (2009) The decline in the concentration of citations, 1900–2007. J Am Soc Inf Sci Technol 60: 858–862.
- 32. Phelan TJ (1999) A compendium of issues for citation analysis. Scientometrics 45: 117–136.
- 33. Franck G (1999) Science communication—A vanity fair? Science 286(5437): 53–55.
- 34. Bras-Amorós M, Domingo-Ferrer J, Torra V (2011) A bibliometric index based on the collaboration distance between cited and citing authors. Journal of Informetrics 5: 248–264.