Stochastic Blockmodeling of the Modules and Core of the Caenorhabditis elegans Connectome

Recently, there has been much interest in the community structure or mesoscale organization of complex networks. This structure is characterised either as a set of sparsely inter-connected modules or as a highly connected core with a sparsely connected periphery. However, it is often difficult to disambiguate these two types of mesoscale structure or, indeed, to summarise the full network in terms of the relationships between its mesoscale constituents. Here, we estimate a community structure with a stochastic blockmodel approach, the Erdős-Rényi Mixture Model, and compare it to the much more widely used deterministic methods, such as the Louvain and Spectral algorithms. We used the Caenorhabditis elegans (C. elegans) nervous system (connectome) as a model system in which biological knowledge about each node or neuron can be used to validate the functional relevance of the communities obtained. The deterministic algorithms derived communities with 4–5 modules, defined by sparse inter-connectivity between all modules. In contrast, the stochastic Erdős-Rényi Mixture Model estimated a community with 9 blocks or groups which comprised a similar set of modules but also included a clearly defined core, made of 2 small groups. We show that the “core-in-modules” decomposition of the worm brain network, estimated by the Erdős-Rényi Mixture Model, is more compatible with prior biological knowledge about the C. elegans nervous system than the purely modular decomposition defined deterministically. We also show that the blockmodel can be used both to generate stochastic realisations (simulations) of the biological connectome, and to compress network into a small number of super-nodes and their connectivity. We expect that the Erdős-Rényi Mixture Model may be useful for investigating the complex community structures in other (nervous) systems.


Supplementary Figures
. Membership structure of the neurons in the Spectral fit. Neurons are coloured coded according to their ganglion type.

Erdős-Rényi Mixture Model
For comprehensiveness, we give a thorough review of the ERMM, proposed by Daudin, Picard and Robin [2], and we offer detailed and more complete proofs than found in the original references.
We define G to be a simple random graph which is fully specified by the binary and symmetric adjacency matrix X = ((X ij )) 1≤i,j≤n . This matrix has several obvious characteristics, the first is that the principal diagonal is 0, since the graph is simple, and the second is that the number of data points in X is given by n(n − 1)/2 which is just the count of entries in the upper or lower triangular matrix.
In order to describe the ERMM, we first concentrate on the assumptions about the nodes (vertices). For the graph G, the set of all vertices, labelled as {V i } i∈1,...,n , is assumed to be divided into Q unknown blocks where the membership structure of each such block is determined by a 1 × Q dimensional vector ..,n . In particular, the elements of Z i are the mutualy independent latent variables Z iq which label vertices according to their block membership, thus we have Furthermore, for a division of G into Q blocks, Z i is assumed to follow the single trial multinomial (or categorical) distribution where the parameter α is the Q × 1 vector of the probabilities α = (α 1 , . . . , α Q ). Subsequently, the probability that a randomly chosen vertex in a network is located in a q-th block is given as with constraint that the Q q=1 α q = 1. The immediate interpretation of this assumption is that the vertex belongs to one group only and, in the common parlance, this is known as the hard partitioning.
To complete description of the ERMM, we focus next on the assumptions about the edges. For this, the ERMM specifies that, given the block assignments of the vertices, the elements of X are conditionally independent Bernoulli random variables with rates given by their corresponding elements in the connectivity matrix π = ((π ql )) 1≤q,l≤Q . In other words, if a vertex V i belongs to a block q and a vertex V j belongs to block l, then or for the vertices V i and V j located in the same block For the subsequent proofs, it is convenient to express the elements of the connectivity matrix π as the conditional probabilities π qq = P(X ij = 1|Z iq = 1, Z jq = 1).
Following the traditional notation of Paul Erdős and Alfréd Rényi, we can define the Erdős-Rényi Mixture Model as G = G(n, π) where, for a fixed Q, we have Q(Q + 1)/2 mini Erdős-Rényi models (ERMs), posed not only on the blocks but, also, on the relationships between the blocks. Thus, just as in the ordinary ERM, we can obtain the distribution of degrees and this is summarised in the following proposition.
Proposition: In an Erdős-Rényi Mixture Model G = G(n, π), given the class membership of a vertex, the conditional distribution of the degree of this vertex (ρ(V i )) is Binomial (approximately Poisson) where π q = Q l=1 α l π ql and λ q = (n − 1)π q .
Proof We consider random variable ρ(V i ) = n j=1 X ij . The value of this variable (ρ(V i )) increases only when X ij = 1 and because of this we need to consider the following probability: Furthermore, ((X ij )) 1≤i =j≤n are conditionally independent, given the classes of vertices V i and V j , al-lowing us to conclude ρ(V i )|Z iq = 1 ∼ Bin(n − 1, π q ).
As the Binomial distribution can be approximated by the Poisson, we get With this, the distribution of degrees is then defined as a mixture of Poisson distributions such that

Maximising the likelihood with the variational approach
For a fixed Q and the parameters ψ = {α, π}, the complete data log likelihood is given as To verify this, we consider log L(x, z; ψ) = log L(z; α) + log L(x|z; π).
As Z i follows a multinomial distribution, its likelihood is given as Taking logarithms, we get Furthermore, we have and combing everything completes the verification.
To estimate the model parameters, however, we need the likelihood of the observed data X, which is typically obtained by taking a sum over expression (10) with respect to all possible values of Z.
Unfortunately, this sum is not tractable and the standard strategy, like the Expectation Maximisation (EM) algorithm, provides some reduction in the computational burden but imposes a drastic reduction in the size of networks that can be handled by the analysis. To resolve these issues, Daudin, Picard and Robin [2] proposed to use the variational approach [3,4] which requires that the distribution of Z is of the form where P (Z i = z i ; τ i ) is the multinomial distribution with parameter τ i = (τ i1 , . . . , τ iQ ) and τ iq = P(Z iq = 1|X = x). The form of the joint distribution given in the expression (11) is suggested by the model assumption by which the latent variables are independent. In the context of the variational approximation, the goal is to maximise the following quantity where KL[·||·] is the Kullback-Leibler divergence. This gives the following estimating equations.
Proposition: Given parameters α and π, the optimal variational parametersτ i = arg max {τi} J (P(Z); ψ, τ ) satisfy the following point relation Proof To show this J (P(Z); ψ, τ ) is maximised with respect to the variational parameter τ i , subject to the constraint Q q=1 τ iq = 1, that is, the goal is to maximise the following quantity: where ξ i is the Lagrange multiplier and J (P(Z); ψ, τ ) is given as Substituting for J (P(Z); ψ, τ ) into equation (14), differentiating with respect to τ iq and setting this expression to zero, we get allowing us to concludeτ Proposition: Given the variational parameters τ i , the values of the parameters α and π that maximise Proof Maximising with respect to α, subject to the constraint Q q=1 α q = 1 this gives:α Similarly, maximising with respect to ππ

Estimation of the number of blocks via the ICL criterion
The model selection is handled by the ICL criterion, which was proposed by Biernacki et al. [5]. The construction of the ICL criterion relays on the lemma which states that, if the prior parameter distributions for a model with Q blocks M Q , p(α|M Q ) and p(π|M Q ), are such that then log L(x, z|M Q ) = log L(x|z, M Q ) + log L(z|M Q ).
Proposition: For a model M Q with Q blocks, the ICL criterion is where M Q denotes a model with Q blocks andẑ denotes its estimate such that the elements ofẑ î Proof Considering, log L(x, z|M Q ) = log L(x|z, M Q ) + log L(z|M Q ) .
The first term is obtained by application of large sample Laplace integral approximation (i.e., Bayesian Information Criterion BIC), so we have For the second term we use the Dirichlet prior, D(δ), as its conjugate is the multinomial distribution, and we get log Γ(n q + δ) − log Γ(n + Qδ) .
Setting δ = 1 2 , as it corresponds to the Jeffreys prior, and replacing z with its estimateẑ, we get:

The Clustering Coefficient
The probabilistic definition of the clustering coefficient as proposed by Daudin, Picard and Robin in [2] is given as follows.
Proposition: In the ERMM, the clustering coefficient iŝ C DPR = q,l,sα qαlαsπqlπqsπls q,l,sα qαlαsπqlπqs . (28) Proof where for the last step, we used conditional independence of X and the independence of Z.

Spectral Algorithm
The problem of community detection is usually defined as finding the partition of a network into communities of densely connected vertices while minimising the number of connections between the communities. The goodness of a graph partition is generally assessed with a quality function whose the most frequently used version is known as modularity and it was proposed by Newman and Girvan [6]. The idea behind the concept of modularity is that the communities are found by the comparison of the actual density of connections in the subgraphs and the density one would expect to find if the vertices were connected at random. This random version of the original graph, that acts as a reference point in the modularity function, is called a null model and, typically, it is tailored to preserve some of the features of the original graph like the same number of edges or the same degree distribution [7][8][9].
In practice, the standard null model of modularity preserves the degree distribution of the original graph by generating half-edges so that each vertex in a null model receives as many half-edges as its corresponding degree in the original graph. Thus, the probability of randomly picking V i is expressed as a proportion of V i 's degree in the total sum of the degrees, that is, ρ(V i )/2m. Furthermore, the probability that vertices V i and V j form a complete edge is given as: ρ(V i )ρ(V j )/4m 2 , while the expected count of edges considered for V i and V j is ρ(V i )ρ(V j )/2m := P ij . According to this, the modularity is defined as where c i and c j denote the communities of vertices V i and V j , respectively, while δ(c i , c j ) = 1 if V i and V j are located in the same community, 0 otherwise.
The Spectral algorithm [10,11] optimises the modularity by utilising the eigenvalues and eigenvectors associated with the modularity matrix D, whose elements are Let s be an indicator vector which decomposes the nodes into 2 communities, with s i = 1 if the vertex V i is located in the first community and s i = −1 if the vertex is located in the second community. This modifies the modularity function (29) as follows Moreover, the vector s can be written as a linear combination of the normalised eigenvectors u i associated with the matrix D, thus, s = i a i u i and a i = u T i s. Using this along with the fact that β i is the eigenvalue of D corresponding to the eigenvector u i , we get The idea is to look for the largest positive eigenvalue of D, and then group the vertices according to the elements of the corresponding eigenvector.
The extension of the algorithm to more than two communities is reflected in the consideration of the additional contribution ∆f mod to the modularity after dividing a community g with size n g into two communities ∆f mod = 1 2m where for the last step, we note that using the Kronecker delta notation δ ij , we can write i,j∈g such that D (g) is n g × n g matrix whose elements are: D The algorithm stops when there are no more positive eigenvalues.