Colored motifs reveal computational building blocks in the C. elegans brain

Complex networks can often be decomposed into less complex sub-networks whose structures can give hints about the functional organization of the network as a whole. However, these structural motifs can only tell one part of the functional story because in this analysis each node and edge is treated on an equal footing. In real networks, two motifs that are topologically identical but whose nodes perform very different functions will play very different roles in the network. Here, we combine structural information derived from the topology of the neuronal network of the nematode C. elegans with information about the biological function of these nodes, thus coloring nodes by function. We discover that particular colorations of motifs are significantly more abundant in the worm brain than expected by chance, and have particular computational functions that emphasize the feed-forward structure of information processing in the network, while evading feedback loops. Interneurons are strongly over-represented among the common motifs, supporting the notion that these motifs process and transduce the information from the sensor neurons towards the muscles. Some of the most common motifs identified in the search for significant colored motifs play a crucial role in the system of neurons controlling the worm's locomotion. The analysis of complex networks in terms of colored motifs combines two independent data sets to generate insight about these networks that cannot be obtained with either data set alone. The method is general and should allow a decomposition of any complex network into its functional (rather than topological) motifs as long as both wiring and functional information is available.


Introduction
Over the last decades, systems biology and network theory have contributed tremendously to our understanding of complex systems [1][2][3][4], revealing for example that the topological architecture of the molecular interaction networks within a cell is shared to a large degree by other complex systems, such as the Internet, computer chips and society [2]. This insight led to the development of various quantitative tools in network theory to analyze the complex structures within biological networks.
Complex networks like electronic circuits are frequently represented in terms of modules such as operational amplifiers, logical gates and memory, and it is often suggested that biological networks can similarly decomposed into functional modules that have stereotypical functions [5,6]. Because the detection and identification of modules is a notoriously difficult task [7], a different approach focuses on the identification of conserved network motifs [8][9][10], that is, sub-networks of small size (typically two to five arXiv:1012.3641v1 [q-bio.MN] 16 Dec 2010 nodes) that are significantly more abundant in a network compared to a control network that had its edges randomly rearranged. The idea behind looking for significant motifs is evolutionary in nature: those motifs that are conducive to the function of the organism within its environment will be preferentially maintained over motifs that are either neutral in function or even detrimental. For example, an analysis of structural motif abundances in a variety of biological networks shows that these abundances can in part be explained by the motifs' robustness to small perturbations [11]. This thinking equally applies to technological systems that do not evolve according to strict Darwinian rules. For example, a comparison of motif abundances in biological, technological, social, and even word-adjacency networks [12] shows that these networks can be grouped into clusters that share similar motif abundance profiles.
We analyze motifs in the network of synaptic and gap-junction connections of the neuronal network of the nematode C. elegans. This network controls one of the most well-understood complex biological systems to date, and most of the network architecture of the 302 neurons of the hermaphrodite worm is known from experimental work [13,14] as well as recent reconstructions [15]. The most up-to-date wiring information covers 279 neurons of the somatic nervous system, excluding 20 neurons of the pharyngeal system and three neurons that appear to be unconnected from the rest [15]. There are 3,606 edges between these nodes, of which some (the synaptic connections) are directed, while gap-junctions are undirected.
An analysis of topological motifs in this network has revealed that two major building blocks are significantly overrepresented in the C. elegans neuronal network: the feedforward loop, and the bi-fan motif [16]. It is believed that these motifs perform stereotypic functions and play a crucial role in the nematode's descision-making and control [15,17,18]. However, while there is support for the hypothesis that over-represented motifs point to biological function from the evolutionary conservation of motifs in the yeast protein-protein interaction network [19], these conclusions have also been questioned [20] on the grounds that topology alone does not contain enough information to predict the function or process, or how biochemical reactions are likely to proceed in biological systems [9]. Indeed, the identification A B Figure 1. Significance of uncolored vs. colored motifs. (A): Non-significant motif from an uncolored analysis [16] becomes highly significant (B) if colors are used to attach functional tags to the nodes. Green: sensor neurons, red: interneurons, blue: motor neurons.
of the feedforward and bifan motifs does not allow us to determine how these motifs are used, or how they contribute to the worm's behavior. A simple example can illustrate this point: in Fig. 1A, we show a three-node motif that was not found to be significantly overrepresented in previous analyses [16][17][18]. However, if we color each neuron according to three possible functional tags such as motorneuron (blue), sensor neuron (green), or interneuron (red), several colored motifs stand out with high significance (see below), among which the motif shown in Fig. 1B. The functional significance of this motif is immediately obvious: it relays sensory information via an interneuron towards a muscle. Indeed, previous studies have shown that the connections between neurons of the three types chosen here are heavily biased: neurons do not connect indiscriminately between types [18,[21][22][23]. Also, an analysis of colored motifs using GO annotations in the yeast protein-protein interaction network [24] suggests that differently colored motifs are differentially evolutionarily conserved, pointing to a diversity of functional roles for motifs with the same structure.
Here we combine two important data sets for a systematic analysis of the topological and functional motifs in the C. elegans brain: the connection graph and the functional characterization of each neu-ron [15]. Using these datasets, the entire C. elegans neuronal network becomes a colored graph where nodes represent neurons, edges are connections between neurons, and the color of the node tags the cell-type of the node. It is clear that the choice of the cell-type set (the colors) is crucial for the success of this method, and different choices will produce different results. At the same time, the classification into the three cell types is in itself ambiguous, because there are differences of annotation in the literature, and some cells are sometimes annotated as belonging to two classes. Other classifications exist (such as into ten different morphological classes [25]), but the motif analysis of graphs with more than three colors quickly becomes computationally cumbersome. Here, we study the abundance distribution of colored directed motifs of sizes two to four nodes. The number of possible motifs in a network strongly depends on the size of the motif, whether edges are directed, and the number of colors used to tag the nodes (see Table 1).
While an identification of functional motifs can help us understand how the worm uses its neuronal network for signal transduction, we should keep in mind that the worm also uses extrasynaptic signaling for behavior [26]. Furthermore, several different molecules can modulate synaptic function at a single neuron [27]. Thus, some of the computation that translates signals into actions takes place outside of the connection graph proper, and cannot be explored via a motif analysis. The table shows the actual numbers of colored motifs of a given size (and directedness) found in C. elegans neuronal network as well as the theoretically possible number of colored motifs as a pair of numbers (actual/possible). UM(1): undirected motifs, uni-colored, DM(1): directed motifs, uni-colored, UM(2): undirected motifs, two colors, DM(2): directed motifs, two colors, and so on.

Adaptive significance of colored motif distribution
If the coloration of a motif (that is, the identity of colors at different positions of the motif) has adaptive significance, we should see a bias in the colored motif distribution with respect to control networks whose color assignments have been scrambled. We extract colored motif abundances from the colored networks by counting all distinct color combinations for each of the structural motifs of size 2, 3, and 4. Of the 279 neurons, 86 are classified as sensor neurons (and colored green in the following), 80 are classified as interneurons (red), and the remainder of 114 neurons are classified as motorneurons (blue) [15]. We stress again that the classification of some neurons is uncertain because other groups [28] have classified some neurons as belonging to two types simultaneously, and some neurons' classification is tentative. However, the results presented here do not vary significantly if a few neurons are misclassified.
In order to determine whether the abundance of a particular colored motif in C. elegans is biased, we produce random colored control networks by shuffling the color assignments in the C. elegans network while maintaining the relative abundance of each kind. The mean abundance N R of colored motifs of a particular type for 1,000 independent randomizations then provides the unbiased expectation for that motif, which we compare with the actual count N CE obtained for the colored worm brain. In Fig. 2, we plot the logarithm (base 2) of the ratio N CE /N R for each colored motif as a function of the random count N R , to determine the extent to which the worm motifs are over-or underrepresented. Most of the motif counts in C. elegans are significantly different from the random control: all of the 2-node colored motif counts are significant, and all but one of the three-node motifs (one-sample two-tailed t-test, P < 0.05). Of the 4-node motifs, only 156 of the observed 8,310 motifs are not significantly different from the control count at the 5% level. We find a tendency of colored motifs in C. elegans to be under-represented compared to a randomly colored control, but with a significant number of motifs that are found much more often than expected by chance. (The distribution of normalized z-scores is strongly biased towards suppression, but with a long tail indicating over-expressed motifs, see Supplementary  Fig. S1). This finding suggests that the majority of possible colored motifs are not useful or downright detrimental, but a handful of them are so useful that they appear between 2 and 60 times as often as in an average randomly colored network. Note that some motifs that readily appear in the random controls are completely absent in C. elegans: 11 colored motifs of size three and 5,460 motifs of size 4 do not appear at all, which is also significantly different from what is expected by chance: at most 5 motifs of size 3 (1.02 on average) and 3,667 motifs of size 4 (2,634 on average) were absent by chance in any of the 1,000 randomizations.

Two-node motifs
In previous work that analyzed structural motifs only [16][17][18], the undirectional two-node motif was found to be unremarkable, while the bi-directional motif was deemed over-represented [16,18] with respect to an ensemble of edge-randomized networks. We can look at both of those motifs in terms of the exceptionality of their colorations. In Fig. 3  These distributions show that the observed functional constraints make intuitive sense. For example, we find the motor-to-sensor-neuron motif to be significantly suppressed: we do not expect muscles to relay information to sensory neurons in a functioning worm (even though some of these connections do indeed exist). On the other hand, the sensor-to-inter-as well as inter-to-inter-neuron motifs appear significantly more often than expected by chance, as appropriate for information-processing motifs.

Motifs as computational building blocks
Previous work identified the feed-forward motif as significantly over-represented [14,16,18,29] in the C. elegans brain, as well as is gene regulatory networks [8,9]. We find that while many feed-forward motifs with colors are also over-represented, many others appear not to be useful. Whether a numerical over-representation (as measured, for example, by z-scores) is statistically significant must be determined carefully, by correcting for multiple hypothesis-testing (as pointed out earlier, [16,18]) because it is possible that any individual motif's abundance can appear to be significantly different from the randomization control purely by chance. We have generalized the step-down min-P procedure [16,18] to colored motifs (see Methods) to calculate the multiple-hypothesis-corrected P-values for the colored motifs of size 3 and 4. Using 100,000 color randomizations of the C. elegans network, 40 motifs of size 3 (out of a possible 273, see Table 1) have a significant P-value at the 5% level (26 of which have a corrected P < 0.002) and are shown in Fig. 4, while 505 out of the 8,310 observed motifs of size 4 have a corrected P = 0.055 for 100,000 randomization (shown in Fig. S2). Because the number of independent hypotheses for motifs of size 4 is so large (13,770), the corrected P-values depend on the number of randomizations, and can only become significant if the number of randomizations significantly exceeds the number of hypotheses. As a consequence, the P-values for these 505 size-4 motifs will dip below the 5% level if the number of  Figure 2. Differential representation of colored motifs. Comparison of the colored motif counts N CE obtained from the C. elegans neuron network and the average count from 1,000 color-randomized network, N R . Points above the zero line represent the colored motifs with higher frequency in the worm's neuronal network compared to color-randomized networks (over-representation), while those below that line are suppressed. A: colored directed motifs of size 2, B: colored directed motifs with three nodes, C: colored directed motifs with 4 nodes. Logarithm is to the base 2.
randomizations is increased even further, and we will treat the set of 505 motifs with corrected P = 0.055 as our set of significantly over-represented motifs of four neurons.

Motifs of size three
In Fig. 4, we show the 40 significantly over-represented motifs of size three, starting with the forwardprocessing motif (a relay chain) from sensor-to inter-to motor-neuron, already shown in Fig. 1. That this motif is the most notable among all motifs with three neurons confirms that the overall structure of the chemical synapse network is a three-layer architecture [15]. The second-most over-used motif is also a relay-chain into the motor neuron, but from another interneuron, suggesting that this is just the 3-neuron end of a 4-neuron chain that starts with a sensor neuron. And indeed, that chain does appear among the significant motifs of size 4 (see below). The "beginning" of that chain also appears among the significant 3-neuron motifs. The only other over-represented purely directional chain (using chemical synapses only) is the interneuron chain.
The motif with the third-highest z-value is a feed-forward motif of three interneurons. Feed-forward motifs have different uses in computation, depending on whether the feed-forward signal is excitatory or inhibitory. Often, these motifs are used to control activation only when input is present [9], or to perform "perfect adaptation" to constant signals [30]. While feed-forward motifs have previously been identified as important in C. elegans, it is noteworthy that the most-used type consists of interneurons only, even though they are in the minority among neuron types. In fact, the list of motifs in Fig. 4 is clearly dominated by interneurons, (69% of the neurons in the list of 40 motifs, compared to only about 29% of all neurons in the full network of 279). This imbalance suggests that the motifs represent computational building blocks that describe the information-processing task: while sensors and motors serve mainly as signal sources and sinks, interneurons work as the signal transducers.
Another highly significant feed-forward motif has the signal originating in a sensor-neuron (see Fig. 4). Many of these feed-forward motifs come in alternate versions where one of the edges is an undirected gap junction, but they are never the most common. This is to be expected as there are far fewer undirected edges (514) than directed edges (2,194). The computational purpose of a feed-forward motif using a bidirectional gap junction is not immediately obvious, but it is possible that this back connection (or many back connections) are meaningful in information-processing by providing the opportunity of feedback. Note also that the "ring" motif where three nodes feed a signal into each other is absent among the significantly over-used motifs even though among three-node motifs with three edges, 40% should be rings by chance. While chains, splits, and merges are not significantly differentially used when analyzing  uncolored motifs [16,18], some color combinations are significantly over-used as apparent from Fig. 4. Their purpose becomes more apparent when considering the four-node motifs.

Motifs of size four
The larger the motif, the more specific its computational function. At the same time, the number of possible motifs also increases greatly with motif-size. Of the 13,770 possible colored motifs with four nodes and directed edges, only 8,310 actually appear in the C. elegans network. We estimate that the number of possible colored motifs of size 5 is in the millions, preventing a significance analysis. As in the size-3 motifs, interneurons are significantly enriched within the computational motifs (68% of neurons in size-4 significant motifs, compared to the baseline abundance of 29% in the network as a whole). For four nodes with directed edges, there are 199 possible motifs that are structurally different, but many of those topologies are not prominent among the 505 colored motifs that are most significantly over-represented (shown in Supplementary Fig. S2). Among those, we distinguish five functional classes of motifs using chemical synapses (directed edges) only, shown in Fig. 5. These classes cover a significant portion, but not all of the 199 possible structural motifs. (When motifs have undirected edges, they sometimes straddle two classes of motifs.) The most common motif-class is the nested feed-forward motif (Fig. 5A), of which there are several kinds. About 40% of the most significantly over-represented motifs fall in this class. They are distinguished from bi-fan motifs (Fig. 5D) by the number of nodes with highest in-degree but no out-degree: bi-fans have two output nodes with an in-degree of two (see the example in Fig. 6A) while nested feedforwards usually have a single output node with an in-degree of two or three. Only about 5% of the motifs among our list of 505 are bi-fans according to this definition. The second-most common group of motifs (about 25%) are feed-forward loops with entry or exit (Fig. 5B), followed by the "integration and bifurcation" motifs (Fig. 5C, about 20%), and the relatively rare bi-fans, followed finally by the forward chain (5%, Fig. 5E).
Functional motifs that we do not discuss (about 5% of the motifs in the set of 505) either do not show up among the 505 prominent motifs or else are under-represented. An example is the "nested rings" motif, an instance of which is shown in Fig. 6B). The relative absence of ring motifs in the network could imply that feedback via the neuronal connection graph is not extensively used for computation by the worm.
In the following, we discuss the most common colorations of the motifs in each of the classes, their possible computational function, and point out some of these motifs in a model of the C. elegans subnetwork used for forward locomotion (Fig. 6D), described in [31].

Nested feed-forward loops
In this class of motif, one or two inputs are fed forward through one or two relay neuron towards a single output (see examples in Fig. 5A). Among the top-ten colored motif types by z-value, this motif appears six times (see Supplementary Fig. S2). We can see several motifs of this class in the reconstruction of the core network for C. elegans locomotion [31], which models the undulatory behavior of the worm with a biomechanical model based on the connection structure of the C. elegans neuronal network. We show nine of the core network nodes and their connections in Fig. 6D, colored according to our convention. The nodes named "Xv" and "Xd" are representatives of a class of interneurons (SAA) that connect in the manner shown to the ventral and dorsal head stretch receptors "Sv" and "Sd". Similarly, the motor neurons labelled "VB" and "VD" are representatives of 18 such neurons [28]. The "AVB" and "PVC" neurons are representatives of the "master controllers" for forward locomotion [32]. The reconstruction is noteworthy because it can infer that some of the connections are inhibitory rather than excitatory. The control of PVC and AVB via the SAA neurons (Xd) in Fig. 6D is a good example of a nested feedforward motif, as is the control of DB via the relays PVC and AVB with Xd as the source (but note that because there are both synaptic (directed) and undirected edges between Xv and AVB, the motifs are not strictly only feed-forward). Examples of highly-represented motifs of this sort are shown in Fig. 6E, along with their motif number and z-value as seen in Supplementary Fig. S2. Feed-forward motifs can be nested in different manners, processing two inputs in parallel, or a single input sequentially or in a hierarchical manner (see Fig. 5A). All these motifs appear about equally in colorations that have sensoror interneurons as the source, and inter-or motorneurons as the signal sink.

Feed-forward with entry or exit
A feed-forward loop with a signal connecting to the input, output, or relay-neuron (see sketches in Fig. 5B) is the 2nd most frequent structure among the most-significant colored motifs of size 4. Most commonly, the output of the feed-forward loop (the signal-neuron) is directly connected to a motor neuron, highlighting the use of the feed-forward motif in controlling locomotion (four out of the top five colored motifs in this class are of this sort). A feed-forward loop consisting only of interneurons with its output directed into a motor neuron is the motif with the two highest z-values among the 505 most-significant colored motifs. The third-most common motif in this class has the entry inter-neuron of the feed-forward loop replaced with a sensor-neuron. Several of the motifs in this class can readily be seen in the forward locomotion network Fig. 6D. There are no "ring" motifs with entry or exit among the 505 significantly over-abundant motifs of this class.

Integrations and bifurcations
This class of motifs (examples are depicted in Fig. 5C) is relatively straightforward as there is no feedforward or feed-back of signals. Rather, signals are either distributed via bifurcations or integrated. The most common motif in this class is a sensor neuron connected to a chain of three interneurons, followed by the integration of a sensor-and an interneuron which is fed into a motor-neuron, shown in Fig. 6C Figure 6. Colored motifs and network context. A: A common colored bi-fan motif. B: A nested ring motif that is uncommon in C. elegans. C: A signal integration motif driving a single output. D: The core of the locomotion network of C. elegans, after [31]. Nodes are colored according to the scheme used throughout. Arrows with single points are excitatory connections via a chemical synapse, while edges ending in a bar signal inhibition via a chemical synapse. Edges with two arrow heads denote gap junctions. E: A selection of significant four-node motifs that appear in the locomotion network shown in D. Below each motif appears the rank and z-score as in Supplementary Figure S2. Note that we included motifs that have a synaptic junction (directed link) between Xv or Xd and AVB, because both synaptic and gap junction (undirected) edges exist between those neurons. most common bifurcation motif is a sensor-neuron feeding an inter-neuron, whose signal is distributed over two motor-neurons. Neither of these motifs can be found within the forward locomotion network Fig 6D or its extensions, so we assume that they are part of another important pathway for C. elegans behavior.

Bi-fan motif
The bi-fan motif (Fig. 5D) is a well-known motif structure (see, e.g., [4]) that regulates two independent outputs using two inputs. The computational function of any bi-fan motif depends on whether the connections are excitatory or inhibitory [20], and on whether the inputs themselves are connected. While the motif is used sparingly in C. elegans, some colorations are absolutely essential to the worm's behavior. Indeed, the bi-fan motif controlling the motor neurons VB and VD via PVC and AVB (see Fig. 6E, rightmost motif) happens to be the third-most over-represented motif (by z-value) of all size-4 motifs (see Fig. 6E as well as Fig. S2). We note, however, that the version where inputs do not communicate (first type in Fig. 5D) is used much more rarely.

Relay chains
Relay chains of four nodes such as depicted in Fig. 5E are comparatively more rare. The most overrepresented such chains are the forward processing chain from a sensor-into a motorneuron via two interneurons (which appears in Fig. 6D) or from three interneurons into the motor neuron, followed by variations on the theme with gap junctions replacing the chemical synapses. The relative rarity of the 4-node forward chain underscores how important signal integration and feed-forward processing is for the worm.

Discussion
Combining two sets of data that make the neuronal network of C. elegans one of the best understood animal control structures known, namely the connection map between neurons and the functional characterization of each neuron, allows us to gain insight into the computational building blocks of the worm brain by determining the over-representation of colored motifs, with respect to a color-randomization of a network with the connectivity unchanged. We find that while certain structural motifs have previously been found to be significant with respect to an edge randomization of the network [16,18], many more colored motifs are highly significant. Indeed, the overall trend is the suppression of nonsensical motifs, such as signal chains where muscles feed information into sensors, or inter-neurons deliver signals to sensors. The motifs that are used significantly more often than predicted by chance, as determined by a multiple-hypothesis-corrected test, are easily identified as important elements in a signal-processing network. Sensor-neurons are almost always sources of signals (their in-degree is significantly higher than their out-degree), while motor neurons are most often the end of the signal chain. Interneurons represent a much larger fraction of nodes in computational motifs than their overall abundance in the network would imply, suggesting that they relay the bulk of the forward-processing information. Notably absent among the common motifs are feedback loops, and relay chains longer than three neurons, underscoring the need for immediate reaction and the integration of signals. While these observations cannot take into account whether the connections between neurons are excitatory or inhibitory (as this data is not available for the majority of the connections), the comparison with the core forward-locomotion network of C. elegans (where this inference has been made) suggests that an analysis of colored motif utilization captures the computational processes underlying that behavior well.
In the future, we imagine that an analysis of the utilization spectrum of colored motifs can be extended to any network where nodes can be assigned tags that differentiate their biological (or social) function, but care must be taken to limit the number of functional classes as the predictive power of this approach is quickly overwhelmed when too many motifs are possible.

Motif abundances and color randomization
The wiring diagram as well as the functional classification of neurons into sensor-inter-and motorneurons, was obtained from [15]. Networks were encoded in terms of an adjacency matrix A(i, j) where A(i, j) = 1 if a chemical synapse connects neuron i to j, with A(j, i) = 0. Undirected edges (gap junctions) have A(i, j) = A(j, i) = 1. Colored motifs counts were obtained using our own implementation of the FANMOD algorithm [33,34]. We define the z-score of a C. elegans motif as z = (N CE − N R )/σ, where N CE is the abundance for that motif in the C. elegans network, N R is the average of the abundance distribution of that motif in color-randomized networks, and σ is the standard deviation of that distribution. Color randomization of the C. elegans colored graph is performed by repeatedly switching the colors of two randomly chosen nodes, thus preserving the color distribution and the underlying graph topology. The color switch is repeated sufficiently often to guarantee a random color distribution.

Multiple hypothesis testing
We are testing the hypothesis that a colored motif in the C. elegans neuronal networks is significantly over-represented, compared to the same motif in a color-randomized network. Because many hypotheses are tested simultaneously, the probability of rejecting the null hypothesis for any motif by chance at least once increases with the number of hypotheses tested. To correct for this, we adapted the single-step min-P procedure for multiple-hypothesis adjustment [35,36] that was also used by [16,18] as follows. For each size class of motifs, let N CE (i) be the count of colored motif i among the M possible colored motifs, and N R (p) be the motif count for the same motif i in the pth color-randomization of the C. elegans network (p = 1 · · · S), where S is the number of randomizations. Using the Heaviside (step)-function definition: we define the raw P-value for each C. elegans colored motif i as We also define the raw P-value for any randomization r of motif i to obtain the most significantly over-represented randomization (by chance) among the colorations i P min (r) = min i P (r, i) .
Finally, the single-step min-P adjusted P-value for motif i, π(i), is obtained by comparing the raw Pvalue for each of the motifs P CE (i) to the smallest of the P-values found in the randomizations (across all motifs) as: π(i) = Pr(P CE (i) ≤ P min (p), 1 ≤ p ≤ S) . Figure S2. Rank and z-scores (un-normalized) for the 505 motifs of size 4 with single-step min P adjusted P-value P=0.0556 for 100,000 randomizations. Each of these motifs has a P CE = 0 (see Methods), which implies each of these motifs was more abundant in the C. elegans network than in any of the 100,000 randomized networks. But because there are 5,560 entries in P min that vanish, the adjusted P-value cannot be smaller than 0.0556. Increasing the number of randomizations leads to a smaller fraction of zeros in P min , and thus decreases the adjusted P-value of those motifs that have P CE = 0. (First 56 motifs only, remainder available upon request).