Supplementary Materials for: Gene communities in co-expression networks across different tissues

S1.1 Null model We assume undirected and unweighted networks. A common choice for a null model is the configuration model, in which the degree sequence {ki}i=1 is specified, and an edge is laid between nodes i and j with probability pij = kikj 2M , (S1) where ki is the degree of the ith node in the original network. However, we avoid the configuration model for two reasons. First, by imposing the value of k1, . . . , kN , the stochastic generation of edges sharing a node is not independent of each other. For example, if k1 is small and we have generated edge (1, 2) with probability p12, then edge (1, 3) is generated with a probability smaller than p13. This type of correlation makes it difficult to analytically derive the quality measure of individual communities that requires the count the edges sharing a node. Second, large kikj values can yield pij > 1 [1]. One could set a structural cutoff degree kmax to be of the order of √ N to enforce the constraint pij < 1 [2]. However, it is often the case that the largest degree in an empirical network far exceeds this structural cutoff value [3, 4]. Instead of the configuration model, we use an exponential random graph model (ERGM) as the null model. Instead of the exact degree, we fix the expected degree of each ith node to k∗ i , the degree of the same node in the original network. Let Ω be an ensemble of networks with N nodes. Let ~ θ ≡ (θ1, . . . , θN ) be the model parameters. The probability distribution of the adjacency matrix, A = (Aij), that maximizes the Shannon entropy subject to the constraints ∑

We assume undirected and unweighted networks.A common choice for a null model is the configuration model, in which the degree sequence {k i } N i=1 is specified, and an edge is laid between nodes i and j with probability where k i is the degree of the ith node in the original network.However, we avoid the configuration model for two reasons.First, by imposing the value of k 1 , . . ., k N , the stochastic generation of edges sharing a node is not independent of each other.For example, if k 1 is small and we have generated edge (1, 2) with probability p 12 , then edge (1, 3) is generated with a probability smaller than p 13 .This type of correlation makes it difficult to analytically derive the quality measure of individual communities that requires the count the edges sharing a node.Second, large k i k j values can yield p ij > 1 [1].One could set a structural cutoff degree k max to be of the order of √ N to enforce the constraint p ij < 1 [2].However, it is often the case that the largest degree in an empirical network far exceeds this structural cutoff value [3,4].
Instead of the configuration model, we use an exponential random graph model (ERGM) as the null model.Instead of the exact degree, we fix the expected degree of each ith node to k * i , the degree of the same node in the original network.Let Ω be an ensemble of networks with N nodes.Let θ ≡ (θ 1 , . . ., θ N ) be the model parameters.The probability distribution of the adjacency matrix, A = (A ij ), that maximizes the Shannon entropy subject to the constraints where k i (A) is the degree of the ith node in network A, and the normalization condition where is the probability that nodes i and j are adjacent [5][6][7].
We infer the model parameters θ by maximizing the associated log-likelihood function, i.e., where A * is the adjacency matrix of the original network.One can derive Eq. (S6) from Eq. (S4).We numerically determine θ i 's using a fixed point method implemented in the Python package NEMtropy [7].We use NEMtropy to generate this so-called undirected binary configuration model (UBCM) [7].We estimate the UBCM for each layer independently.Using the obtained multilayer UBCM as null model, we calculate the statistical significance of individual communities detected in the original multilayer network.

S1.2 Number of intralayer edges within each community
We use two measures to assess the quality of individual communities.The first measure is the number of intralayer edges within each community, which is essentially the same as X used in the main text for correlation matrices.Previous studies used the number of edges within a community in single layer networks [8,9], and we extend this quality measure to multilayer networks.
Let S be a set of nodes in a multilayer network.Let X α be the number of edges within S in layer α and let X be the total number of intralayer edges within S, i.e., In the UBCM, the edges are independently laid.Therefore, X obeys the Poisson binomial distribution, which is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed [10].Let θ iα be the UBCM parameter for node i in layer α.Using Eq. (S5) for each layer, we obtain the expectation of X as follows: where the summation is over all node pairs (i, α), (j, α) in S. Note that we have excluded self-loops.The variance of X is equal to [10] Var (S9)

S1.3 Conductance of each community
The second quality measure is the conductance of each community.Let G(V, E) be an undirected single-layer network, and consider a set of nodes S ⊆ V .Let be the number of edges on the boundary of S and be the number of edges within S.Then, the conductance of S is given by [9] ϕ The conductance measures the fraction of the number of half-edges emanating from nodes in S that are connected to a half-edge emanating from a node outside S. Therefore, the conductance is small for a good community [11].
To define the conductance of a set of nodes S in a multilayer network with L layers, let Y α be the number of edges on the boundary of S in layer α.We define the conductance of S by where Y is the number of intralayer edges on the boundary of S. Because the UBCM independently lays edges for different node pairs, Y as well as X obeys a Poisson binomial distribution.It should also be noted that X and Y are independent because they are calculated based on disjoint sets of node pairs.We denote the set of intralayer node pairs within community S by We note that the cardinality (i.e., number of elements) of E within max is equal to where N Sα is the number of nodes in S in layer α.Because X obeys the Poisson binomial distribution, the probability for X is where and x is the number of edges in S. Similarly, the probability for Y is where Note that the cardinality of E boundary max is Because X and Y are mutually independent, the expected value of the conductance of a set of nodes S in the multilayer network is given by The variance of the conductance of S is given by In Eqs.(S21) and (S22), we set y 2x+y = 1 for (x, y) = (0, 0).

S1.4 Results
We show in Table S1 the Z scores for the number of intralayer edges within each community and for the conductance of each community detected in the unweighted multilayer network obtained by graphical lasso with γ = 1 and γ = 3.For both partitions, all the communities with a Z score of N/A are composed of a single gene.Except these single-gene communities, for both γ = 1 and γ = 3, all other communities are statistically significant with a large positive Z score for the number of intralayer edges and a large negative Z score for the conductance.

S2 Derivation of the variance of the total intralayer weight for a community in a multilayer correlation matrix
To derive Eq. ( 13) in section 2.5, we start with where L is the number of samples we draw from the N -variate multivariate normal distribution.Now, using the fact that different samples are independent, we obtain By substituting Eq. (S24) into Eq.(S23), using the fact that the covariance matrices C α and C β are independent when β = α, and using Isserlis' Theorem [12], we obtain By combining the first and third terms in Eq. (S25), we obtain

S3 Z scores for the average distance between pairs of genes on each chromosome separately in each community
We show in Table S3 the Z scores for the average distance between pairs of genes on each chromosome and each community with γ = 1.We show the corresponding results with γ = 3 in Table S4.We only calculated the Z scores for the chromosome-community pairs with at least 3 genes.In these tables, N/A implies that either there are less than 3 genes, or the standard deviation of the average distance is equal to 0 because every random selection of genes is the same gene set.The latter event occurs when all the genes on that chromosome are associated with the same community.

S4 Gene set enrichment analysis results
We show the two most significant GO:BP and HP results from g:Profiler for the top 50 highly expressed genes out of the 203 genes in the network in each tissue in Table S5.See [13] for the entire output from g:Profiler, i.e., the list of all significant GO:BP and HP results, for the top 50 genes in pancreas; see [14] for the top 50 genes in salivary gland; see [15] for the top 50 genes in mammary gland; and see [16] for the top 50 genes in skin.We show the two most significant GO:BP and HP results from g:Profiler for each community in the partition of the multilayer correlation matrix with γ = 3 in Table S6.See [17] for the entire output from g:Profiler, i.e., the list of all significant GO:BP and HP results, for the genes in community 1; see [18] for community 2; see [19] for community 3; see [20] for community 4; see [21] for community 5; see [22] for community 6, and see [23] for community 7.

Table S1 :
Z scores for the number of intralayer edges within each community and for the conductance of each community detected in the unweighted multilayer network obtained by graphical lasso with γ = 1 and γ = 3. Comm.denotes community and no.denotes "number of".

Table S3 :
Z scores for the average distance between pairs of genes on each chromosome and each significant community detected with γ = 1.Comm.denotes community and Chr denotes chromosome.

Table S4 :
Z scores for the average distance between pairs of genes on each chromosome and each significant community detected with γ = 3. Comm.denotes community and Chr denotes chromosome.

Table S5 :
Results of the gene set enrichment analysis for the top 50 highly expressed genes out of the 203 genes in the network in each tissue.10 −15 mitochondrial inheritance 5.06 • 10 −19 aerobic electron transport chain 5.35 • 10 −14 centrocecal scotoma 1.59 • 10 −18