Network Effects on Scientific Collaborations

Background The analysis of co-authorship network aims at exploring the impact of network structure on the outcome of scientific collaborations and research publications. However, little is known about what network properties are associated with authors who have increased number of joint publications and are being cited highly. Methodology/Principal Findings Measures of social network analysis, for example network centrality and tie strength, have been utilized extensively in current co-authorship literature to explore different behavioural patterns of co-authorship networks. Using three SNA measures (i.e., degree centrality, closeness centrality and betweenness centrality), we explore scientific collaboration networks to understand factors influencing performance (i.e., citation count) and formation (tie strength between authors) of such networks. A citation count is the number of times an article is cited by other articles. We use co-authorship dataset of the research field of ‘steel structure’ for the year 2005 to 2009. To measure the strength of scientific collaboration between two authors, we consider the number of articles co-authored by them. In this study, we examine how citation count of a scientific publication is influenced by different centrality measures of its co-author(s) in a co-authorship network. We further analyze the impact of the network positions of authors on the strength of their scientific collaborations. We use both correlation and regression methods for data analysis leading to statistical validation. We identify that citation count of a research article is positively correlated with the degree centrality and betweenness centrality values of its co-author(s). Also, we reveal that degree centrality and betweenness centrality values of authors in a co-authorship network are positively correlated with the strength of their scientific collaborations. Conclusions/Significance Authors’ network positions in co-authorship networks influence the performance (i.e., citation count) and formation (i.e., tie strength) of scientific collaborations.


Introduction
Study of co-authorship network has been the subject of intense interest in recent years because this type of network not only depicts academic society but also represents the structure of our knowledge in an open innovation community [1][2][3]. Co-authorship network is an important class of social network. A social network is defined as a collection of individuals, each of whom is acquainted with some other subset of others by one or more different types of relations such as friendship, kinship and coauthorship [4]. Researchers have been analyzing co-authorship network extensively to explore factors affecting behaviour, performance and motivation of scientific collaborations [5][6][7]. Somewhat similar to the much studied citation networks, coauthorship implies a much stronger bond among authors than citation. Unlike citation networks where nodes are papers and the links between them are citations [8], in a co-authorship network nodes represent authors and links between nodes imply a scientific collaboration.
Co-authorships of research collaborations and publications have a long history. The first collaborative scientific paper was published in 1665 [9]. The first issue of the journal 'Philosophical Transactions of the Royal Society (Phil. Trans.)' was published on 6 March 1665. The Royal Society of London is the publisher of this journal and the first issue of this journal was edited by the society's first secretary Hendry Oldenburg. This very first volume of this journal published many single-author papers (e.g., Petit [10] ) and few two-author papers (e.g., Moray and Du Son [11]). During the last few decades, the scientific collaboration has increased rapidly in diverse research areas [12][13][14][15] and researchers have been exploring research questions related to the outcome measures (e.g., citation count) of their scientific collaborations. Mazloumian [16] examined, for instance, the predictive capability of citation count and found that citation counts are reliable predictors of future success (e.g., future citation counts and attract research grant) for scientists. Landmark papers of famous scientists are not only acknowledged by many immediate citations but also they boost citation rates of the previous publications of the corresponding scientist [17]. The analysis of co-authorship networks for exploring patterns of scientific collaboration is a comparatively young research discipline. During the 1990s, a number of authors pointed out the potential utility of co-authorship data and in some cases performed small-scale statistical analyses [18][19][20]. An early example of the analysis of co-authorship network is the Erdös Number Project [21]. Paul Erdös was an influential but itinerant Hungarian mathematician. He was one of the most prolific authors of research papers and had been involved in writing at least 1401 papers, which was more than the number of publications of any other mathematician who lived before or during his time. In bibliographical terms, the Erdös number represents a mathematician's proximity to the great man.
Co-authorship data have attracted considerable interest in recent years because co-authorship data are the source of the largest (free and computerized as well) social networks available among researchers [22,23]. Researchers have approached the analysis of co-authorship data in various ways such as basic level statistical analysis using charts and regression [24], and structure and pattern of co-authorship networks [5,25]. Liu et al. [26] adopted the social network measures of degree, closeness, betweenness and eigenvector centrality to explore individuals' positions in a given co-authorship network. Yan and Ding [27] later utilized basic centrality measures to explore, at an actor-level, how network positions of authors in a co-authorship network affect the citation counts of their papers. In their research, they consider that authors of a paper share the same citation count (i.e., citation count of that paper) regardless of the order of authors in the author list of that paper. Like them, we also use basic social network centrality measures in this study. However, their works were author-centric. They explored the effect of the network position of an author on the citation count of all her/his papers. On the other hand, our works are paper-centric. We investigate the effect of the network positions of all co-authors of a research paper on its citation count.
We have two research objectives in this study. First, we aim to explore how citation count of a scientific paper is influenced by the network positions of its co-author(s) in a co-authorship network. Second, we explore how authors' network positions influence their strength of relations with others in a co-authorship network. The outcomes of these two research objectives can contribute significantly to the state of the art in co-authorship network studies. Scientists would be able to know the impact of their network positions in the co-authorship network on the citation counts of their published papers and on the strength of their scientific relations with their colleagues. Researchers would be able to identify potential researchers in their own research areas. In order to establish research collaborations, this information might be very helpful for early career researchers and those who wish to establish external research collaborations. Not only that, a virtual ranking of all authors of any research area could be developed from the information of their network positions in co-authorship networks. Therefore, outcomes of this study would help in identifying potential researchers and in developing effective and efficient research collaborations. The following two questions motivate this study: 1. How is the citation count of a scientific paper influenced by the network positions of its co-author(s) in a co-authorship network? 2. How is the strength of scientific relations (i.e., co-authorship relations) between two authors influenced by their network positions in a co-authorship network?
We use the terms paper, research publication, research article, research paper and journal article interchangeably. Similarly, the words researcher, author and scientist are exchangeable in this paper. Node, actor and individual are also interchangeable. The rest of this paper is organized as follows. In section two, we illustrate the conceptualization of our two research questions. This is followed by the research methodology as described in section three. In section four, we posit the research findings of this study. Finally, in section five we make a general discussion about the research findings of this study. In this section, we also posit the conclusive remarks of this study.

Conceptualization of Research Questions
In this research, we study co-authorship networks to explore what network attributes of authors in a co-authorship network influence the citation counts of scientific papers and the strength of relations with the other members of that co-authorship network. More specifically, if a paper has two co-authors (say Au1 and Au2) who are also part of a co-authorship network having five authors, then this study examines: (i) which network attributes of Au1 and Au2 affect the citation count of that paper; and (ii) what network attributes of Au1 and Au2 affect their strength of relation with the remaining authors (i.e., three authors) of that co-authorship network. Figure 1 and Figure 2 conceptualize our research questions with illustration. Figure 1A shows author-paper network for three papers (i.e., P1, P2 and P3) that are written by four authors (i.e., Au1, Au2, Au3 and Au4). The corresponding co-authorship network of Figure 1A is illustrated in Figure 1B. In Figure 1C, we exhibit, in addition to co-authorship network, the network measures/ attributes (i.e., At1 and At2) of each co-author. These network   Figure 1C. Avg stands for statistical function Average which is used to normalize different network attributes (i.e., degree centrality, closeness centrality and betweenness centrality) of authors. The ''?'' symbol above the line indicates, whether or not, the measure on its left hand side has any impact on the measure on its right hand side. (b) Illustration of the second research question (i.e., how the strength of scientific relations (i.e., co-authorship relations) between two authors is affected by their network positions in a co-authorship network?) based on Figure 1C. Avg and ''?'' represent the same as like in (a). (c) Summary of research investigations. NP stands for Network Position in respect of network measures considered in this study (i.e., degree centrality, closeness centrality and betweenness centrality), CC stands for Citation Count and TS stands for Tie strength. The symbol '«' stands, whether the left hand measure of the symbol has any impact on its right hand measure. doi:10.1371/journal.pone.0057546.g002 measures for each co-author are measured from Figure 1B. We consider only two network measures for illustration. There could be more network measures for authors to be considered that depend mainly on the research question(s) under consideration. Figure 2A shows the illustration of our first research question (i.e., how is the citation count of a research paper influenced by the network measures of its co-author(s)?). Our second research question (i.e., how is the strength of scientific relations between two authors influenced by their network positions in a co-authorship network?) is illustrated in Figure 2B. These illustrations of our research questions (i.e., Figure 2A and Figure 2B) are based on Figure 1C. The summary of our research investigation is illustrated in Figure 2C.

Data Source
In this study, we utilize co-authorship data from the research field of 'steel structure'. We explore our research questions at two levels: (i) for the complete dataset; and (ii) for small groups within the complete dataset. For the group level, we choose two research groups from Monash University, Australia and National University of Singapore (NUS). These two groups have a very good reputation for their scientific contributions to the research field of 'steel structure'. That means we consider three separated coauthorship networks -one for the complete research dataset and two (i.e., NUS and Monash University) for the group level dataset. Obviously, the group level dataset are part of the complete research dataset. Then we explore our two research questions for these three co-authorship networks separately. We consider research publications from the year 2005 to 2009. We extracted research publication details for our research dataset from Scopus, which is one of the largest abstract and citation databases for peerreviewed literature and other scientific publications [28].
We first create a query to search research articles from Scopus. In this query, we specify 'steel structure' as search phrase, and seek out this phrase in the title, keywords and abstract section of research articles. We also define the time frame (i.e., 2005 to 2009) and a list of journals to limit our search. The journal list, as named in Table 1, and the single search phrase (i.e., 'steel structure') were suggested by a domain expert of the 'steel structures' research area. Then we import all journal articles in comma-separated value (CSV) format resulting from our query. In this imported dataset we notice that there are some journal articles which do not have complete bibliographic information such as author details, citation details and publication year. We do not consider those articles in    the data analysis. By using affiliation information of authors, we then extract publication details for the 'steel structure' research groups of Monash University and National University of Singapore separately. Basic statistics of the research publications of these two groups are shown in Table 2.

Network Measures Used in this Study
Various network measures such as centrality, tie strength and density have gained significant interest in recent years [29,30] and in many disciplines they play an important role to quantify and identify informal network which functions at level beyond the formal and traditional structure of relationships [31][32][33]. In this study, we use four network measures. Three of them are basic network centrality measures: (i) degree centrality; (ii) closeness centrality; and (iii) betweenness centrality. The fourth one is the tie strength measure, which was first introduced by Mark Granovetter [34].
The selection of these four network measures for analyzing coauthorship network is guided by three network theories: (i) Bavelas' Centralization Theory [35]; (ii) Freeman's Centrality Theory [36]; and (iii) Granovetter's Strength of Weak Tie Theory [34]. Bavelas theory states that network structures of communication and collaboration among individuals have a positive impact on performance. Freeman's centrality theory posits that centralities of actors in a network have an impact on their ability to perform. Tie strength among actors in a network has an impact on the ease of knowledge transfer and sharing, according to Strength of Weak Tie Theory of Granovetter.
Degree centrality or degree, which is defined by the number of direct links that a particular node has in a network [29], is one of the basic network centrality measures of social network analysis (SNA). It highlights highly connected nodes and, eventually, reflects those nodes having more direct contrast and adjacency with others in a given network [29]. As the co-authorship networks are, by definition, undirected, in this study we use simple degree centrality measure for authors. In a co-authorship network having n actors, the equation of degree centrality for an author Au i can be defined as follows [29]: Degree Centrality (Au i )~d (n i ) n{1 Where, d(n i ) represents the number of authors with whom author i is connected in the co-authorship network. Closeness centrality expands the definition of degree centrality by focusing on how close a node is to all other nodes of the network. For an individual node, it represents to what extent a node is in a close position to the remaining nodes of the network. In a coauthorship network having n authors, closeness centrality for an author Au i can be defined by the following equation [29]: Where, d(n i ,n j ) is the number of lines in the shortest distance between author i and author j, and the sum is taken over all i?j.
Betweenness centrality is obtained by determining how often a particular node is found on the shortest path between any pair of nodes in the network. It views an actor as being in a favoured position to the extent that the actor falls on the shortest paths between other pairs of actors in the network. That is, nodes that occur on many shortest paths between other pair of nodes have higher betweenness centrality than those that do not [36]. The coauthorship networks considered in this research are connected. In a co-authorship network of size n, the betweenness centrality for an author Au i can be represented by the following equation [29]: Where, i ? j ? k; g jk (n i ) represents the number of shortest paths linking the two authors that contain author i; and g jk is the number of shortest paths linking author j and author k.
Tie Strength defines the quality of relationship between two actors in a network. According to Granovetter [34], the strength of relation between two actors can be expressed as a combination of the amount of time and the reciprocal services which characterize the tie between them. In the context of co-authorship network, tie strength represents the strength of relation between two scientists in terms of scientific collaborations, research outcomes, joint publications, and so on. In this study, we consider the total number of papers co-authored by two scientists in measuring the tie strength of their research collaboration.

Approach of Research Analysis
Using co-authorship dataset, we first construct co-authorship networks for the two research groups of NUS and Monash University. We then quantify network measures (i.e., degree centrality, closeness centrality and betweenness centrality) for each author of those co-authorship networks. We use ORA, which is a dynamic Table 4. Top-10 papers (in respect of average of degree centrality and betweenness centrality values of co-authors) and their corresponding citation counts.   network analysis tool capable of performing node-level and network-level analyses of weighted networks [37], to measure these three network centrality values for each author. Degree centrality and betweenness centrality values of all co-authors are averaged respectively for each paper so that a single degree and betweenness value will be associated with each paper. For measuring tie strength between two authors, we consider the number of scientific papers co-authored by those two authors. Finally, we use the Spearman correlation test to check whether network measures of authors have any impact on citation counts, and on their strength of scientific collaborations. The Spearman correlation test approach is chosen because we notice that the distributions of all network measures considered in this research are non-normal. After that, we use the regression method to explore the impact of SNA measures on the citation count of papers and tie strength between authors. Figure 3 illustrates the flow chart of research analysis process followed in this study.

Results
In this section, we discuss the findings of this study. We present these research findings under the following three subtitles.

Impact of Network Positions of Co-authors on Citation Counts of Publications
The correlation coefficient values between each of three centrality measures and citation count are being presented in Table 3. For our complete research dataset, it is revealed that the average of the degree centrality and betweenness centrality values of all co-authors of a scientific publication have positive correlations with the citation count of that paper (rho = 0.397, p,0.01 at 2tailed and 0.349, p,0.01 at 2-tailed respectively). For NUS and Monash University research group, we notice that the average of the degree centrality values of all co-authors of a scientific publication has a positive correlation with the citation count of that publication (rho = 0.326, p,0.05 at 2-tailed and rho = 0.433, p,0.05 at 2tailed respectively). It is also evident that betweenness centrality shows a similar relationship with the citation count of a research publication for both NUS and Monash University research group (rho = 0.384 p,0.05 at 2-tailed and rho = 0.412, p,0.05 at 2-tailed respectively). However, closeness centrality does not show any significant correlation for the complete research dataset as well as for both for NUS and Monash University research group, with the citation count of a research paper.
We plot the citation count of each paper against the network attribute of each of its all co-authors in Figure 4. This figure illustrates how the research dataset look like in terms of network position of each co-author and the corresponding citation count of the paper. We considered degree centrality, closeness centrality and betweenness centrality to measure network position of each co-author. A significant difference in citation counts of published papers (for the complete research dataset as well as for both NUS and Monus University groups) is noticed for authors who have same values for network measures. This could be explained by the fact that there are few highly connected and well cited authors (e.g., professor) in all three networks and less prominent authors (i.e., less connected Table 6. Top-10 collaborations between authors (in respect of tie strength) and scaled network measures (i.e., degree centrality and betweenness centrality) of corresponding collaborators. and less cited) have co-authored with them (e.g., student-professor link). It is also noticed that betweenness centrality values for many authors are zero. These authors could be either students and/or new comers to the scientific field. For this reason, they do not play any bridging role in the co-authorship network. For the group level research dataset, we then present the top-10 papers in respect of the average values of degree centrality and betweenness centrality, and their citation counts for both NUS and Monash University research groups in Table 4 Although the values for degree centrality and betweenness centrality are in order (highest to lowest) in Table 4 corresponding citation counts are not. If we found both of them (i.e., either degree centrality and citation count, or betweenness centrality and citation count) follow similar ordering (e.g., highest to lowest) from our research dataset then the corresponding correlation coefficient values of Table 3 must be 1.0 (i.e., perfect correlation). We do not find any correlation coefficient value of 1.0 although they are statistically significant. For this reason, although degree centrality and betweenness centrality values are in order in Table 4 corresponding citation counts do not follow the similar ordering.

Impact of Network Positions of Authors on Their Strength of Scientific Collaboration
The correlation coefficient values between network centrality of authors and the strength of their scientific collaborations are being presented in Table 5. For the complete research dataset, it is evident that degree centralities and betweenness centralities of a pair of authors have positive impact on the strength of their research collaboration (rho = 0.331, p,0.01 at 2-tailed and 0.327, p,0.01 at 2-tailed respectively). For the group level research dataset, it is evident that degree centrality and betweenness centrality of authors in a co-authorship network have an impact on their strength of scientific collaborations for both NUS and Monash University research groups. It is highly expected to have strong tie strength between two authors if they have higher degree centrality, or betweenness centrality, or both. Closeness centralities of authors are positively correlated with the strength of their scientific collaboration for NUS but not for Monash University research group and for the complete research dataset.
We then plot the tie strength between two co-authors against the network attributes of each of the co-authors in Figure 5. This figure illustrates the research dataset in terms of network position of each author and the tie strength with all her/his co-author(s). We considered degree centrality, closeness centrality and betweenness centrality to measure network position of each author. From this figure, it is evident that there is a significant difference in network measures for two co-authors who either form a strong tie or weak tie. This could be explained by a student-supervisor relation where the student, who does not collaborate with any other author, publish many paper (i.e., strong tie strength) or very few paper (i.e., low tie strength) with the supervisor who is highly connected and well cited. Some of the authors do not play any bridging role in the coauthorship network as reflected in the betweenness centrality values (some of these values are zero). Table 6 presents the top-10 collaborations among authors in respect of tie strength for the group level research dataset. The highest tie strength value for NUS research group is 3 whereas for Monash University research group this value is 4. For some authors from both NUS and Monash University research groups, it is evident that they have a high degree centrality but low betweenness centrality. It could be explained by the fact that some authors have high number of collaborations with many different other authors; however, they do not play a bridging role in the co-authorship network. For this reason, they have high degree centrality but low betweenness centrality. In similar way, the presence of some authors in our dataset having low degree centrality and high betweenness centrality could be interpreted.

Regression Models for Citation Count and Tie Strength
We developed regression models for citation count and tie strength using the complete research dataset as well as NUS and Monash University data. These models are summarized in Table 7. All the beta values of this table are significant. That means, degree centrality and betweenness centrality have a significant impact on the citation count of a scientific publication and tie strength between two authors. Both citation count and tie strength can be measured from the corresponding degree centrality and betweenness centrality values. For example, the relations among degree centrality, betweenness centrality, and citation count for NUS dataset can be represented by the following equation: Citation Count(NUS)~2:512z1:715|Degree Centrality z16:080|Betweenness Centrality

Discussion and Conclusion
This research is motivated by two research questions: (i) ''how is the citation a count of a scientific paper influenced by the network positions of its co-author(s) in a co-authorship network?'' and (ii) ''how is the strength of relations between two authors influenced by their network positions in a co-authorship network?'' In answer to these research questions, we observe that citation count of a scientific paper is affected by the degree centrality and betweenness centrality of its co-author(s). We also find that degree centrality and betweenness centrality of a pair of authors have positive impact on their strength of scientific collaboration in a coauthorship network. The corresponding correlation coefficient values for degree centrality and betweenness centrality are ranging from 0.326 to 0.503. All of these values are statistically significant although they are not showing perfect or upper level correlation (i.e., correlation coefficient is close to 1). A small correlation coefficient value could be statistically significant if sample size is high; whereas, for a small sample size (e.g., 35) a high correlation coefficient value would not be statistically significant [38]. A correlation coefficient value of 0.04, for instance, would be statistically significant for a sample size of 10,000 [38].
For ordinary social networks (e.g., friendship network), it has earlier been shown that strong ties are associated with dense network neighbourhoods while weaker ties act as bridges [34,39]. Because of their bridging capability, weak ties are considered as bottlenecks for the diffusion of information. However, Pan and Saramä ki [40] show that dense local neighbourhoods mainly consist of weak ties and strong ties are more important for overall connectivity in a co-authorship network. This is because the strong ties (e.g., between professors) are there for longer time as compared weak ties (e.g., between student-professor). In this study, we find that, in a co-authorship network, strong ties between authors are associated with network centralities of degree and betweenness. That means, strong ties are associated with dense network neighbourhoods. Therefore, unlike Pan and Saramäki [40], the findings of this study are in align with many other earlier studies (e.g., [34]) on ordinary social networks. The difference in the findings between our study and Pan and Saramä ki [40] could be explained by the fact that the evolutionary patterns of coauthorship networks are not similar in different research contexts [41]. This study utilizes dataset from 'steel structure' research area; whereas, they used archive dataset that contains publication from different domains.
A research paper can attract high volume of citation when it facilitates knowledge creation and innovation [42,43]. Thus, the findings (related to citation count) of this study elicit positional characteristics of prolific authors, in terms of knowledge and innovation, in a co-authorship network. According to our finding, more frequently cited papers are mostly co-authored by scientists who have higher degree centrality and betweenness centrality in a coauthorship network. That indicates authors, who have more connectivity (i.e., degree centrality) and capacity to control the flow of information (i.e., betweenness centrality), are contributing more to knowledge creation and innovation compared to other authors, who have less connectivity and less information control in a coauthorship network.
This research is not without its limitations. This research was conducted using co-authorship dataset for only two research groups from a single research discipline (i.e., 'steel structures'). Hence, studies involving datasets from more research groups and research areas as well as from inter-disciplinary research areas are needed before we can arrive at more definitive conclusions regarding the generic nature of our research findings.
As evidenced in the current co-authorship literature, most of the research on co-authorship network analysis focus on the overall topology of networks, analysis of statistical properties of individuals, and relationship between citation and centrality measures at author level. However, to our knowledge, there is no such study in the literature that seeks the impact of network positions of authors in a co-authorship network on the citation counts of scientific publications and the tie strength of scientific collaborations between authors.