Quantitative Analysis of the Interdisciplinarity of Applied Mathematics

The increasing use of mathematical techniques in scientific research leads to the interdisciplinarity of applied mathematics. This viewpoint is validated quantitatively here by statistical and network analysis on the corpus PNAS 1999–2013. A network describing the interdisciplinary relationships between disciplines in a panoramic view is built based on the corpus. Specific network indicators show the hub role of applied mathematics in interdisciplinary research. The statistical analysis on the corpus content finds that algorithms, a primary topic of applied mathematics, positively correlates, increasingly co-occurs, and has an equilibrium relationship in the long-run with certain typical research paradigms and methodologies. The finding can be understood as an intrinsic cause of the interdisciplinarity of applied mathematics.


Introduction
Interdisciplinary research means that data, techniques, concepts, and theories from two or more disciplines are integrated to solve problems whose solutions are beyond the scope of a single discipline or area of research practice [1,2].Mathematical science plays an important role in interdisciplinary research, because many problems in various disciplines of physical science, biological science, and social science are using increasingly mathematical techniques [3].The increasing application of mathematical theories and methods to other disciplines have therefore led to the development of mathematical science, especially applied mathematics [4].
The panoramic view of the relationships between disciplines can be drawn as a network, regarding the disciplines as nodes and the interdisciplinary relationships as edges.The network is built here by the disciplinary information of the papers published in the Proceedings of the National Academy of Sciences (PNAS, http://www.pnas.org) in 1999-2013.Two disciplines are connected if there is a paper belonging to them both.Then, the interdisciplinarity of disciplines is quantitatively expressed by the network indicators about the strength and breadth of the connections between disciplines, such as degree, betweenness centrality [5], etc.Those indicators show that applied mathematics not only widely and directly participates in interdisciplinary research, but also makes bridges for carrying interdisciplinary research between other disciplines.
In order to get a more comprehensive understanding of the interdisciplinarity of applied mathematics, we analyze the contents of the papers.The tests of cointegration and correlation on the quarterly numbers of papers containing certain topic words, e.g."algorithm", show that the development of algorithms and that of certain research paradigms [6][7][8][9] (model, experiment, simulation, and data-driven) and transdisciplinary topics [10][11][12] (system, network, and control) obey equilibrium relationships in the long-run, and are positively correlated.The coword occurrence analysis shows the increasing trends of algorithmization of those research paradigms and transdisciplinary topics.Those found relationships can be considered as causes of the interdisciplinarity of applied mathematics.
This paper is organized as follows.The data processing is introduced in Section 2. The network analysis is shown in Section 3. The statistical analysis is presented in Section 4. The conclusion is drawn in Section 5.

Data processing
The journal PNAS publishes high quality research reports, commentaries, reviews, perspectives and letters.The corpus analyzed here consists of 52,803 papers published in PNAS in 1999-2013.The journal provided the discipline information of the papers (Fig 1).There are 3 first level disciplines, viz.biological science, physical science, and social science, and 39 second level disciplines, such as mathematics, computer science, etc.So the papers can be classified according to their discipline information.
Most of the papers have been classified by the first and second level disciplines.Some papers are only classified by the first level disciplines.For those papers, we considered their second level discipline to be the same as their first level one.Hence we added the first level disciplines into the set of second level disciplines.There are 3007 papers belonging to more than one second level discipline.For example, Ref [13] belongs to applied mathematics and ecology.Those papers can be considered to be interdisciplinary papers.The discipline information of the papers will be used to build a network describing the interdisciplinary relationships between disciplines in Section 3.
Many papers have used mathematical techniques, but are not classified into applied mathematics.Thus, we should analyze the contents of the papers.The python package Natural Language Toolkit (NLTK, http://www.nltk.org) is used to build the dictionary for the corpus by its function of morphological reduction.The dictionary contains 31,542 words (S1 Text).Those words belong to the lexicon of NLTK, which includes the English WordNet.Based on the dictionary, the document-term matrix for the corpus is generated, in which the rows correspond to the papers in the corpus and columns correspond to the words.Together with the publication dates of the papers, the quarterly numbers of the papers containing certain words are extracted for analyzing the relationships of algorithms to certain research paradigms and transdisciplinary topics in Section 4.

Network analysis of the interdisciplinarity of applied mathematics
Based on the discipline information of the corpus, a network describing the connections among disciplines is constructed (The discipline network, Fig 2), in which the nodes are the second level disciplines, and two disciplines are connected if there is a paper belonging to them both.For example, applied mathematics and ecology are connected, because Ref [13] belongs to them both.The network is connected, which means no discipline is isolated.The edges of the network can be assigned weights: the number of interdisciplinary papers between two connected disciplines.The network data is provided in S1 Network.
The phenomenon of the dense relationships between disciplines is quantitatively described by the network indicators [5], viz. the average clustering coefficient 0.55, the diameter 3, the average (weighted) degree 16.87 (148.38), and the graph density 0.41.Those indicators also show the small-world property of the discipline network.
The interdisciplinary breadth and centrality of a discipline can be quantitatively described by the degree and betweenness centrality of the corresponding node in the unweighted discipline network respectively.The degree of a node is the number of nodes connecting to it.The betweenness centrality relates to the number of shortest paths from all nodes to all others that pass through that node.If item transfer through the network follows the shortest paths, a node with high betweenness centrality has a large influence on the transfer behavior.
The interdisciplinary strength of a discipline can be expressed by the number of the interdisciplinary papers involving with that discipline, namely the degree of the corresponding node in the weighted discipline network.PageRank also gives a rough estimate of the importance of nodes (receive more connections from other nodes) in a given network.Hence the interdisciplinary breadth and strength of a discipline can be expressed by the PageRank value of the corresponding node in the unweighted and weighted discipline network respectively.
The degree, PageRank and betweenness centrality of applied mathematics in the unweighted network are the highest (Table 1).The degree of applied mathematics is 30, which means the theories and methods of applied mathematics have been directly used by 73.17% of the second level disciplines listed by PNAS, and members of all 3 first level disciplines (Fig 3).The highest value of betweenness centrality means that applied mathematics is a hub node for transferring the ideas, theories, and methods from one discipline to others, and then making bridges for carrying on interdisciplinary research between other disciplines.For example, network cosmology and its application [14][15][16][17] are typical interdisciplinary works among the theory of relativity, network science, and scientometrics, which are connected by geometry.
The degree and PageRank of the discipline of chemistry in the weighted network are the highest, which means the interdisciplinary strength of chemistry is the highest.Those indicators of applied mathematics are low, comparing with those of chemistry.This is caused by that PNAS only published a few applied mathematical papers (350 papers in 1999-2013), comparing with the papers of chemistry (8,645 papers in 1999-2013).So we need a more fair indicator to measure the interdisciplinary strength, which is defined as follows.
The relative interdisciplinary strength S(i) of discipline i is defined here as S(i) = M(i)/N(i), where N(i) is the number of papers of discipline i in the corpus, and M(i) is the number of interdisciplinary papers in discipline i.A simple proxy considering both the interdisciplinary strength and breadth is C(i) = S(i)K(i), where K(i) is the degree of i in the discipline network.The proxy is named the cross indicator.Notice that, for certain discipline i, e.g.applied mathematics, M(i) is slight less than the weighted degree K W (i) (Table 1).This is caused by that some papers belong to more than two disciplines.
Sort the disciplines by the cross indicator (Table 1).The top three are applied mathematics, statistics in mathematical science, and computer science (whose theory closely relates to  The degree, PageRank and betweenness centrality of the nodes in the unweighted (weighted) discipline network are denoted by K (K W ), P (P W ), and B respectively.The interdisciplinary strength is S = M/N and the cross indicator is C = SK, where N is the number of the papers and M is the number of the interdisciplinary papers of a certain discipline in PNAS 1999-2013.
doi:10.1371/journal.pone.0137424.t001mathematical science).The reasons for the high cross indicators differ in different disciplines.Applied mathematics, statistics, computer science, and applied physical science are "output type" disciplines.The ideas and theories of those disciplines have provided a growing arsenal of methods for all of the sciences.Engineering, social science, and economic science are "input type" disciplines.Those disciplines integrate data, techniques, theories, etc. from other disciplines to create new approaches for their problems whose solutions are beyond their own scope.The high values of the aforementioned indicators in applied mathematics are due to the increasing use of mathematical techniques in scientific research.A growing body of work in physics or computer science is indistinguishable from research done by mathematicians, and similar overlap occurs with medical science, astronomy, economic sciences, and an increasing number of fields.It is difficult today to find any discipline that does not have connections to mathematics, even political science [18].

Statistical analysis of the relationships of typical research paradigms and methodologies to algorithms
To understand the underlying causes of the interdisciplinarity of applied mathematics, we discuss the relationships of some typical research paradigms and methodologies to applied mathematics by statistically analyzing the corpus content.A paper containing a topic word means the topic expressed by the word is used or discussed by that paper [19].The topic words expressing the four basic research paradigms (model, experiment, simulation, and data driven) and the methodologies given by the three typical transdisciplinary topics (system, network and control) can be considered to be "model", "experiment", "simulation", "data", "system", "network", and "control" respectively.For each topic word, the high or increasing proportion of the papers containing that word at certain levels reflects the typicality of the corresponding research paradigm or transdisciplinary topic (Fig 4).
There are 31,542 words appearing in the corpus and also belonging to the lexicon of NLTK, in which there are 976 words appearing in more than 10% of papers (S1 Text).We manually selected typical topic words of applied mathematics from the 976 words, and found the word "algorithm", which appears in 11.34% of papers.The relationship of a research paradigm or a transdisciplinary topic to algorithms, at certain degrees, can be expressed by the cointegration and correlation between the quarterly numbers of the papers containing the corresponding word and that of the papers containing "algorithm" (S1 Table ).
Let the scalars of nominal significance levels of the following tests be 0.05.The augmented Dickey-Fuller test [20] (maxlags = 3) shows that all of the time series in S1 Table are first order integrated.The Johansen test [21] shows that almost all of the time series pairs in Table 2 cointegrated.This means that, based on the 60 quarters of data from PNAS 1999-2013, the development of algorithms and that of any one of the mentioned research paradigms or transdisciplinary topics obey an equilibrium relationship in the long-run in the academic system.
In general, correlation analysis for non-stationary series probably gives spurious results, unless the series are cointegrated [22].Hence the cointegrations in Table 2 guarantee the validity of the correlation analysis: the Spearman's rank correlation coefficients [23] and the Pearson product-moment correlation coefficients [24] show that the development of algorithms are positively correlated with that of the mentioned research paradigms and transdisciplinary topics (Table 3).
The co-word occurrence analysis is also an efficient method to measure the relationship between topic words, which is based on the assumption that a paper containing two topic words means the topics expressed by the words are used or discussed by that paper simultaneously [19].The proportions of the papers simultaneously containing "algorithm" and an aforementioned topic word amongst the papers containing that word, and amongst all of the papers are calculated respectively, annually and quarterly (Fig 5).The time series needed for the calculation are listed in S2 Table .The positive slopes of the linear fitting of the annual proportions (Table 4), except "algorithm" + "simulation" in "simulation", show the increasing trends of algorithmization of the research paradigms and the methodologies given by the transdisciplinary topics.The reason for this exception is that the slope of the linear fitting of the annual proportion of the papers containing "algorithm" in all of the papers (0.0030) is lower than that of "simulation" (0.0064).
Those cointegrations, positive correlations and increasing trends of algorithmization appear naturally and can be considered as some causes for the interdisciplinarity of applied mathematics.As simplifications of relevant aspects of research problems, models are generally described by mathematical concepts and language for systematic study [6].Simulation, especially numerical simulation, has become a common method to algorithmically test how well the models are coherent to the experimental results.The widespread availability of computers and economic considerations make many of today's sciences increasingly rely on simulation via mathematical models and algorithms.The scale of the data collected or generated from experiments and simulations can only be analyzed by algorithms [8,9].In fact, today's science is becoming data-driven at a scale unimagined.Meanwhile, the theories of algorithms now guide researchers in mining the results from the collected data [25].System science gives a unified methodology to research the complexity in epistemology by expressing the complex phenomena as complex systems, thus it is considered a transdisciplinary discipline [26].A variety of abstract complex systems are studied as a field of mathematics.Ignoring the functionalities and characteristics of the original systems, systems can be investigated by abstracting them as networks.Researchers from different fields can investigate their respective problems under the unified network framework [12].Algorithms play an important  The time series are the annual proportion of papers containing "algorithm" and a certain topic word (the column heading) amongst papers containing that word (the first row), and amongst all of the papers (the second row). doi:10.1371/journal.pone.0137424.t004 role in the analysis of the topological properties of the networks, such as distance and centrality finding algorithms, graph partitioning and clustering algorithms, and so on [27,28].Understanding of a system is reflected in our ability to control it.Control theory has a distinctly transdisciplinary mission to provide theories and approaches for comprehending complex phenomena [11].The modern study of control uses various mathematical theories and approaches, such as neural networks, Bayesian probability, fuzzy logic, evolutionary computation, etc., which are all closely related to algorithms, e.g.genetic algorithms [29,30].
The connections between applied mathematics and other disciplines are not only caused by algorithms, but also by some other mathematical topics.In fact, certain mathematical topics words, such as "equation", "statistic" can be found in S1 Text.The quantitative analysis of the relationships between them and research paradigms or methodologies can be discussed as above, so is not addressed here.

Conclusion
The interdisciplinarity of applied mathematics is quantitatively analyzed by using statistical and network methods on the corpus PNAS 1999-2013.A network is built based on the discipline information of the corpus, which gives a panoramic view of the relationships between disciplines.Some network indicators, e.g.betweenness centrality, quantitatively described the hub role of applied mathematics in interdisciplinary research.The statistical analysis on the corpus content found that a primary topic of applied mathematics, algorithms, cointegrates, correlates, and increasingly co-occurs with certain typical research paradigms and methodologies.Those findings can be considered as some of the underlying causes of the interdisciplinarity of applied mathematics.

Fig 2 .
Fig 2. The discipline network.It contains 42 nodes and 354 edges.Two disciplines are connected if there is a paper in PNAS 1999-2013 belonging to them simultaneously.doi:10.1371/journal.pone.0137424.g002

Fig 3 .
Fig 3.The neighbors of applied mathematics in the discipline network.A discipline connects to applied mathematics if there is a paper in PNAS 1999-2013 belonging to that discipline and applied mathematics simultaneously.doi:10.1371/journal.pone.0137424.g003

Fig 4 .
Fig 4. The quarterly proportions of the papers containing a certain topic word.The topic words respectively represent four research paradigms, viz.model, experiment, simulation, and data-driven, and three transdisciplinary topics, viz.system, network, and control.doi:10.1371/journal.pone.0137424.g004

Fig 5 .
Fig 5.The quarterly proportions of the papers containing "algorithm" and a certain topic word amongst the papers containing that word (Panels (a,b)), and amongst all of the papers (Panels (c,d)).doi:10.1371/journal.pone.0137424.g005

Table 1 .
Certain quantitative indicators for the interdisciplinarity of disciplines.

Table 2 .
The boolean decisions of the Johansen test on certain time series pairs.When doing the test, we let the scalars of nominal significance levels be 0.05, choose the lagged difference in {1, . .., 3} by AIC, and assume that there are intercepts and linear trends in the cointegrating relations and there are quadratic trends in the data.The values equal to 1 indicate cointegration, and 0 indicate not.

Table 3 .
The correlation coefficients of certain time series pairs.
In each table cell, the first value is the Spearman's rank correlation coefficient, and the second value is the Pearson product-moment correlation coefficient.doi:10.1371/journal.pone.0137424.t003

Table 4 .
The slopes of the linear fitting of certain time series.