^{1}

^{*}

^{1}

^{2}

Conceived and designed the experiments: MR CTB. Performed the experiments: MR CTB. Analyzed the data: MR CTB. Contributed reagents/materials/analysis tools: MR CTB. Wrote the paper: MR CTB.

The authors have declared that no competing interests exist.

Change is a fundamental ingredient of interaction patterns in biology, technology, the economy, and science itself: Interactions within and between organisms change; transportation patterns by air, land, and sea all change; the global financial flow changes; and the frontiers of scientific research change. Networks and clustering methods have become important tools to comprehend instances of these large-scale structures, but without methods to distinguish between real trends and noisy data, these approaches are not useful for studying how networks change. Only if we can assign significance to the partitioning of single networks can we distinguish meaningful structural changes from random fluctuations. Here we show that bootstrap resampling accompanied by significance clustering provides a solution to this problem. To connect changing structures with the changing function of networks, we highlight and summarize the significant structural changes with alluvial diagrams and realize de Solla Price's vision of mapping change in science: studying the citation pattern between about 7000 scientific journals over the past decade, we find that neuroscience has transformed from an interdisciplinary specialty to a mature and stand-alone discipline.

Researchers have developed a suite of network mapping tools to highlight important features while simplifying the overall structure of social and biological systems

Any tool for analyzing change must distinguish between meaningful trends and statistical noise. For example, statistical network models and stratified data make it possible to estimate global properties from the observation of sample networks

Moreover, many of the systems to which we apply network approaches are idiosyncratic in nature and preclude replicate observations. There is one and only one global air traffic network for the year 2009, for example. Therefore we cannot establish statistical significance by looking at multiple samples. Nor can we rely on temporal stability. While structures that remain unchanging over time may be statistically significant, we will not find significant changes by looking for features that stay the same.

One possibility would be to use a resampling technique such as the bootstrap, which assesses the accuracy of an estimate by resampling from the empirical distribution of observations

The standard approach to cluster networks is to minimize an objective function over possible partitions of the network, as in the left side of the diagram. By repeated resampling of the weighted links from the original network, we create a “bootstrap world” of resampled networks. By clustering these as well, and comparing to the clustering of the original network, we can estimate the degree of support that the data provide in assigning each node to a cluster. In the bottom network, the darker nodes are clustered together in at least 95% of the 1000 bootstrap networks.

Finally, to reveal stories in the network data and to be able to connect structural and functional changes, we use

One possibility would be to use a resampling technique such as the bootstrap, which assesses the accuracy of an estimate by resampling from the empirical distribution of observations

Finally, to reveal stories in the network data and to be able to connect structural and functional changes, we use

Science is a dynamic, organized, and massively parallel human endeavor to discover, explain, and predict the nature of the physical world. In science, new ideas are built upon old ideas. Through cumulative cycles of modeling and experimentation, scientific research undergoes constant change: scientists self-organize into fields that grow and shrink, merge and split. Citation patterns among scientific journals allow us to track this flow of ideas and how the flow of ideas changes over time

We first cluster the networks with the information-theoretic clustering method presented in ref.

To identify the journals that are significantly associated with the clusters to which they are assigned, we use simulated annealing to search for the largest subset of journals within each cluster of the original network that are clustered together in at least 95% of all bootstrap networks. To identify the clusters that are significantly distinct from all other clusters, we search for clusters whose significant subset is clustered with no other cluster's significant subset in at least 95% of all bootstrap networks (see

An alluvial diagram (bottom), with clusters ordered by size, reveals changes in network structures over time. Here the height of each block represents the volume of flow through the cluster, with significant subsets in darker color. The orange module merges with the red module, but the nodes are not clustered together in 95% of the bootstrap networks. The blue module splits, but the significant nodes in the blue and purple modules are clustered together in more than 5% of the bootstrap networks. With a 5% significance threshold, neither change is significant.

Once we have a significance cluster for the network at each time point (or each state), we want to reveal the trends in our data: we need to simplify and highlight the structural changes between clusters. In the mapping-change step of

The alluvial diagram for the citation data reveals the significant structural changes that have occurred in science over the past decade. Rather than viewing the entire diagram, let us highlight a couple of interesting stories.

This set of scientific fields show the major shifts in the last decade of science. Each significance clustering for the citation networks in years 2001, 2003, 2005, and 2007 occupies a column in the diagram and is horizontally connected to preceding and succeeding significance clusterings by stream fields. Each block in a column represents a field and the height of the block reflects citation flow through the field. The fields are ordered from bottom to top by their size with mutually nonsignificant fields placed together and separated by half the standard spacing. We use a darker color to indicate the significant subset of each cluster. All journals that are clustered in the field of neuroscience in year 2007 are colored to highlight the fusion and formation of neuroscience.

The alluvial diagram illustrates, for example, how over the years 2001–2005, urology gradually splits off from oncology and how the field of infectious diseases becomes a unique discipline, instead of a subset of medicine, in 2003. But these changes are just two of many over this period. In the same diagram, we also highlight the biggest structural change in scientific citation patterns over the past decade: the transformation of neuroscience from interdisciplinary specialty to a mature and stand-alone discipline, comparable to physics or chemistry, economics or law, molecular biology or medicine. In 2001, 102 neuroscience journals, lead by

In their citation behavior, neuroscientists have finally cleaved from their traditional disciplines and united to form what is now the fifth largest field in the sciences (after molecular and cell biology, physics, chemistry, and medicine). Although this interdisciplinary integration has been ongoing since the 1950s

The problem of detecting structural change in large networks adds two new challenges to the basic problem of network clustering: (1) we need appropriate statistical methods to identify significant features of network clustering and to distinguish between trends and noise in the data, and (2) we require effective visualizations to bring out the stories implicit in a time series of cluster maps. To resolve the first of these challenges, we have developed a method for significance clustering based on the parametric bootstrap. To address the second, we have presented the visualization technique of alluvial diagrams. These methods are general to many types of networks and can answer questions about structural change in science, economics, and business.

Here we lay out the details of how we generate

The method consists of four steps, described below and illustrated in

By repeatedly resampling of the weighted links from the original networks, we create “bootstrap worlds” of 1000 resampled networks. By clustering these bootstrap networks, and comparing to the clustering of the original networks, we can estimate the degree of support that the data provide in assigning each node to a cluster. In the bottom networks, the darker colors represent nodes that are clustered together in at least 95% of the 1000 bootstrap networks. The alluvial diagram highlights and summarizes the structural changes between the time 1 and time 2 significance clusters. The height of each block represents the volume of flow through the cluster. The clusters are ordered from bottom to top by their size, with mutually nonsignificant clusters placed together and separated by a third of the standard spacing. The orange module merges with the red module, but the nodes are not clustered together in 95% of the bootstrap networks. The blue module splits, but the significant nodes in the blue and purple modules are clustered together in more than 5% of the bootstrap networks. Neither change is significant.

Cluster the original networks observed at each time point.

Generate and cluster the bootstrap replicate networks for each time point.

Determine significance of the clustering for at each time point.

Generate an alluvial diagram to illustrate changes between time points.

For simplicity of description, here we map the change between two states

We first partition the network

The bootstrap is a statistical method for assessing the accuracy of an estimate by resampling from the empirical distribution. This method is particularly powerful when the variance of the estimator cannot be derived analytically or when the underlying distribution is not accessible. Because the cluster assignments are a result of a computational method and the network is idiosyncratic by nature, the bootstrap is indispensable for the process described here.

To generate a single bootstrap replicate network

Subsequently we partition the bootstrap replicate network with the same clustering method we used on the original network; this yields the bootstrap modular description

The basic idea behind significance clustering is that we can look at the bootstrap replicates to see which aspects of the modular description of the original network are best supported by the data. Features of the original network that occur in all or nearly all of the bootstrap replicates are well-supported by the data; features that occur in only some of the bootstrap replicates are less well-supported.

What features do we consider? First, we consider the assignment of each node to a module. By looking at the set of bootstrap modular descriptions, we can assess which of these assignments are strongly supported by the data, and which node assignments are less certain. To identify the nodes that are significantly assigned to a module, we search for the largest subset of nodes in each module of the original modular description

To efficiently search the large space of possible subsets in each cluster, we use simulated annealing

In addition to telling us about the assignment of individual nodes to specific modules, the set of bootstrap replicates also contains information about which modules stand alone and which are possibly subsets of other modules. To reveal this information, we need to identify the modules that are always, or almost always, separate from any other module. We consider a module to be significant if its significant subset is clustered with no other significant subset in at least 95% of all bootstrap modular descriptions. Conversely, two clusters are mutually nonsignificant if their significant subsets are clustered together in more than 5% of all bootstrap modular descriptions. In this way, each module can be mutually nonsignificant with more than one other module. In the alluvial diagram described in section 4, we want to associate each nonsignificant module with the module together with which it most likely forms a subset. The search for these pairs of modules is straightforward: For each pair of modules, we count in how many bootstrap modular descriptions all nodes in the two significant subsets are clustered together and record this number if it exceeds 5% of all bootstrap modular descriptions (the criterion for nonsignificant modules). Then, starting at the smallest module, we associate the module with the other larger module with which it is most often clustered, and proceed to the next smallest module, and so on.

To reveal change over time or between states of real-world networks, we summarize the results of the significance clusterings of the different states

We use the stream fields to reveal the changes in cluster assignments and in level of significance between two adjacent significance clusterings. The height of a stream field at each end, going from the significant or nonsignificant subset of a cluster in one column to the significant or nonsignificant subset of a cluster in the adjacent column, represents the total size of the nodes that make this particular transition. By following all stream fields from a cluster to an adjacent column, it is therefore possible to study in detail the mergers with other clusters and the significance transitions. To reduce the number of crossing stream fields, the stream fields are ordered by the position of the clusters to which they connect. For smooth transitions, we draw the stream fields with splines and use gradient shading for the component colors. Finally, to reduce visual clutter and improve clarity, we apply a threshold and do not show the thinnest stream fields.

Here we briefly review our information theoretic approach to revealing community structure in weighted and directed networks and present a new fast stochastic and recursive search algorithm to minimize the map equation — the objective function of our method.

(0.14 MB PDF)

Mapping change in medicine 1997–2007

(0.07 MB PDF)

Mapping change in physics & chemistry 1997–2007

(0.06 MB PDF)

The authors would like to thank Jevin West both for processing the journal citation data and for numerous helpful discussions, and Moritz Stefaner for his extensive help with the information visualizations presented here.